I am using polars.df.write_delta() to initially create, and subsequently append to, Delta Tables in Microsoft Fabric OneLake storage, via a Fabric python notebook.
Having had a production process up and running for some time, I notice that the most frequently-updated tables are showing a warning/error in the Fabric lakehouse web interface:
ErrorCode: DeltaTableNotCheckpointed
Message: Delta table 'SomeTableName' has atleast '100' transaction logs, but no checkpoints. For performance reasons, it is recommended to regularly checkpoint the delta table more frequently than every '100' transactions. As a workaround, please use SQL or Spark to retrieve table schema.
I have read about Delta Lake checkpoints in the official protocol spec, here. My understanding is that the spec does not require writers to create checkpoints, only permit them to if they choose.
In the delta-rs documentation, I found one buried reference that:
Checkpoints are by default created based on the delta.checkpointInterval config setting.
For an example affected table in my environment, it appears this config setting is not defined. I ran this command:
deltalake.DeltaTable(my_abfss_path).metadata().configuration
and the result was just:
{'delta.parquet.vorder.enabled': 'true'}
So this probably explains why I have no checkpoints, in the immediate sense. However, I am not clear on where the responsibility lies for defining this config setting.
Question
At which layer of abstraction should this setting be set?
- Should it have been set by delta-rs when Polars asked it to create the table initially?
- Should it have been set by Polars as part of the internal implementation of
write_delta()? - Should my client code have set it when calling
write_delta()? If so, how exactly? Would that be via thedelta_write_optionsparameter? I can't find anything confirming this anywhere.