Sireesha Madabhushi
10/03/2023, 6:52 AMPerfect Stranger
10/03/2023, 1:11 PMSardar Khan
10/03/2023, 7:08 PMdf.write.format("delta").mode("overwrite")\
.option("delta.columnMapping.mode", "name")\
.option("delta.enableIcebergCompatV1", "true")\
.option("delta.universalFormat.enabledFormats", "iceberg")\
.save("<s3a://cof-card-data-iceberg-research-qa/skDummyTest/>")
I am getting the following error in return:
23/10/03 18:54:52 ERROR DeltaLog: Failed to find Iceberg converter class
java.lang.ClassNotFoundException: org.apache.spark.sql.delta.icebergShaded.IcebergConverter
Am I missing spark jar of some sort? Any one deal with this before?Rahul Goswami
10/04/2023, 5:46 AMquanns
10/04/2023, 7:08 AMSpark
and delta-lake
. For parquet
, it support columnar data encryption by integrating KMS client in the Spark application. Because delta-lake
is based on parquet so I think columnar encryption will works with delta-lake
but it does not. I tried to write data to HDFS with delta
format and the same configuration as I used for parquet then the output data is not encrypted (I can read the data with pandas)
• Are there any configuration that I need to use for applying the columnar encryption on delta-lake
?
• If delta-lake
doesn’t support this feature, are there any solutions that works in the same way as columnar encryption
of parquet? (all advantages of parquet are guaranteed such as pushed down filters)Marius Grama
10/04/2023, 3:06 PMWriteSerializable
in Delta Lake isolation level actually means?
https://docs.databricks.com/en/optimizations/isolation-level.html#write-serializable-vs-serializable-isolation-levelsVaiva
10/05/2023, 6:06 AMPranit Sherkar
10/05/2023, 8:47 AMBenny Elgazar
10/05/2023, 7:07 PM_d_elta_log
a file with 000000000000000001.json to 00000000000000n.json_
but always when it comes to checkpoint number 10 I additional plenty of parquet files.
example:
2023-10-05 12:42:02 9793 00000000000000000010.checkpoint.0000001044.0000001066.parquet
2023-10-05 12:42:02 9354 00000000000000000010.checkpoint.0000001045.0000001066.parquet
2023-10-05 12:42:02 9428 00000000000000000010.checkpoint.0000001046.0000001066.parquet
2023-10-05 12:42:02 9468 00000000000000000010.checkpoint.0000001047.0000001066.parquet
2023-10-05 12:42:02 9400 00000000000000000010.checkpoint.0000001048.0000001066.parquet
2023-10-05 12:42:02 9543 00000000000000000010.checkpoint.0000001049.0000001066.parquet
2023-10-05 12:42:02 9428 00000000000000000010.checkpoint.0000001050.0000001066.parquet
2023-10-05 12:42:02 9436 00000000000000000010.checkpoint.0000001051.0000001066.parquet
2023-10-05 12:42:02 9531 00000000000000000010.checkpoint.0000001052.0000001066.parquet
2023-10-05 12:42:02 9354 00000000000000000010.checkpoint.0000001053.0000001066.parquet
2023-10-05 12:42:02 15689 00000000000000000010.checkpoint.0000001054.0000001066.parquet
2023-10-05 12:42:02 9400 00000000000000000010.checkpoint.0000001055.0000001066.parquet
2023-10-05 12:42:02 9571 00000000000000000010.checkpoint.0000001056.0000001066.parquet
2023-10-05 12:42:02 9634 00000000000000000010.checkpoint.0000001057.0000001066.parquet
2023-10-05 12:42:02 9468 00000000000000000010.checkpoint.0000001058.0000001066.parquet
2023-10-05 12:42:02 9426 00000000000000000010.checkpoint.0000001059.0000001066.parquet
2023-10-05 12:42:02 4235 00000000000000000010.checkpoint.0000001060.0000001066.parquet
2023-10-05 12:42:02 9644 00000000000000000010.checkpoint.0000001061.0000001066.parquet
2023-10-05 12:42:02 9638 00000000000000000010.checkpoint.0000001062.0000001066.parquet
2023-10-05 12:42:02 9427 00000000000000000010.checkpoint.0000001063.0000001066.parquet
2023-10-05 12:42:02 9743 00000000000000000010.checkpoint.0000001064.0000001066.parquet
2023-10-05 12:42:02 9401 00000000000000000010.checkpoint.0000001065.0000001066.parquet
2023-10-05 12:42:02 9428 00000000000000000010.checkpoint.0000001066.0000001066.parquet
When this happen, I cannot longer query the table using athena, its never returned.
Anyone experienced this issue? and knows how to solve it?
the only solution is to rewrite all of the table once again.Beni
10/05/2023, 8:33 PM%sql
CREATE OR REPLACE TABLE Testing(
Id INTEGER NOT NULL,
Name STRING NOT NULL)
USING DELTA
LOCATION 'dbfs:/local_disk0/tmp/deltattest'
Henrique Viana
10/05/2023, 8:51 PMAbhishek Shan
10/05/2023, 8:54 PMAbhishek Shan
10/05/2023, 9:11 PMSahil Shah
10/06/2023, 10:39 AMSamrose
10/06/2023, 9:36 PMChristian Daudt
10/06/2023, 10:44 PMBen Magee
10/07/2023, 4:46 PMSamrose
10/07/2023, 9:59 PMstage 26.0 failed 4 times, most recent failure: Lost task 0.3 in stage 26.0 (TID 34) (10.59.220.113 executor 0): java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:113)
Aishik Saha
10/09/2023, 8:54 AM^
symbols in the filename, using deltalake Python library.
Table URL: "<s3a://my_bucket/bronze/hello^delta^team>"
Python Error:
File ".../.venv/lib/python3.9/site-packages/deltalake/table.py", line 247, in __init__
self._table = RawDeltaTable(
OSError: Encountered object with invalid path: Error parsing Path "/bronze/hello^delta^team": Encountered illegal character sequence "^" whilst parsing path segment "hello^delta^team"
Is there a way to override this parsing check, so that I can read the files? Thanks for reading.Gokhan Ozturk
10/09/2023, 1:17 PMCarly Akerly
10/09/2023, 7:03 PMChinhvu1111
10/10/2023, 3:09 AMChinhvu1111
10/10/2023, 4:00 AMLucas Zago
10/10/2023, 9:26 AMJatin Sharma
10/10/2023, 9:35 AMGokhan Ozturk
10/10/2023, 12:48 PMDouglas
10/10/2023, 1:21 PMPedro Salgado
10/10/2023, 2:11 PMdelta.logRetentionDuration
as we usually do trough spark sql with ALTER TABLE SET TBLPROPERTIES
.
I was able to check the rust code and find reference to the config and is usage in the checkpoint creation, but the cleanup flag is not exposed in python... source: https://github.com/delta-io/delta-rs/blob/main/rust/src/protocol/checkpoints.rs#L95
Any idea how can I achieve log cleanup in delta-rs similar to delta spark?
Thanks!Christina
10/11/2023, 1:42 PMemployees_table = (spark.read
.format("jdbc")
.option("url", "<jdbc-url>")
.option("dbtable", "<table-name>")
.option("user", "<username>")
.option("password", "<password>")
.option("partitionColumn", "<partition-key>")
.option("lowerBound", "<min-value>")
.option("upperBound", "<max-value>")
.option("numPartitions", 8)
.load()
)
display(employees_table.select("age", "salary").groupBy("age").avg("salary"))
Perfect Stranger
10/11/2023, 4:18 PM