228 questions
0
votes
0
answers
52
views
Delta live tables are producing different results
I am trying to perform aggregation on top a table. I applied same aggregation in dlt pipeline and pyspark query. But results are different.
My pyspark query looks like below: -
agg_df = filter_df....
0
votes
1
answer
261
views
Databricks DLT pipeline not deleting records via apply_as_deletes clause
I am trying a simple scenario where as per GDPR request i want to delete data from all the tables across my data pipeline. Looking at DLT documentation i see apply_as_deletes clause in apply_changes() ...
0
votes
1
answer
185
views
How to set external location while using DLT and hive metastore
I originally set the Storage location in my DLT as abfss://{container}@{storageaccount}.dfs.core.windows.net/...
But when running the DLT I got the following error:
So I decided to leave the above ...
0
votes
1
answer
114
views
Bug Delta Live Tables - Checkpoint
I've encountered an issue with Delta Live Table in both my Development and Production Workspaces. The data is arriving correctly in my Azure Storage Account; however, the checkpoint is being stored in ...
0
votes
1
answer
73
views
Stream-stream LeftOuter join is not supported without a watermark in the join keys
I have the bewlow code that fails when Im attampting to do the stream stream left outer joins.
@dlt.view
def vw_ix_f_activity_gold():
return (
spark.readStream
.option("...
0
votes
0
answers
146
views
Improve Latency with Delta Live Tables
Use Case:
I am loading the Bronze layer using an external tool, which automatically creates bronze Delta tables in Databricks. However, after the initial load, I need to manually enable changeDataFeed ...
0
votes
1
answer
98
views
Delta live tables - cant update
Objective
I plan to use Delta Live Tables (DLT) to deliver near real-time reporting in Power BI.
Current Setup
I load Bronze Delta tables every 1 minute using Fivetran.
These Bronze tables serve as ...
0
votes
3
answers
397
views
How to define external dependencies in DLT pipeline definitions?
In order to deploy DLT tables I am using yaml files that define Delta Live Tables Pipeline. Here is an example configuration.
resources:
pipelines:
bronze:
name: ${var.stage_name}_bronze
...
0
votes
1
answer
335
views
Databricks Numeric Type comparsion (Int vs Double)
I am looking at using azure databricks and delta live tables to store and process financial order book data.
This could grow to a very large table over time, with potentially billions of rows and ...
1
vote
1
answer
398
views
Databricks DLT DataFrame - How to use Schemas with Comments
Databricks DLT DataFrame - How to use Schemas
I'm new to Databricks Delta Live Tables and DataFrames, and I'm confused about how to use schemas when reading
from the stream. I'm doing table to table ...
0
votes
1
answer
195
views
Delta Live Table missing data
Got a very simple DLT which runs fine, but the final table "a" is missing data.
I've found that after performing a full refresh live above, if I rerun just the final table, then I get more ...
0
votes
1
answer
134
views
Continuous DLT Pipeline does not perform other tasks on further runs
We have a DLT pipeline running in Continuous Mode as the most-upstream table is a Streaming Table.
In one Materialized View running something in the middle of the whole pipeline, we have a few extra ...
0
votes
1
answer
166
views
Databricks DLT Streaming Schema
I am trying to create a delta live table from a JSON message that has two arrays [NewData, OldData]. I pass my schema in the readStream code and select only NewData.* to get the fields in the NewData ...
0
votes
1
answer
113
views
break DAG lineage in DLT
I have an iterative transformation applied to a dataframe, it used to take a long time and having done lots of research online, it appears the issue was due to the DAG from growing exponentially. To ...
0
votes
1
answer
470
views
Spark Delta Table dependencies are not resolved
I am create a delta table using python, but while we submit code jar dependencies are not resolved:
Below is my code:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, ...
0
votes
1
answer
357
views
CreateDate and UpdateDate in Delta Live Table
I'm ingesting a file using DLT with the code below:
@dlt.view(
name="view_name",
comment="comments"
)
def vw_DLT():
return spark.readStream.format("cloudFiles")....
0
votes
1
answer
275
views
Delta Live Tables: How to reduce size of microbatch?
A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?
I am thinking about a solution used by spark structured ...
0
votes
1
answer
352
views
Databricks DLT Online Table with VNnet Enable on Blob Storage Get 403 Issue
I am trying to create an online table in a Unity catalog. However, I get a GET, 403 error.
DataPlaneException: Failed to start the DLT service on cluster . Please check the stack trace below or driver ...
1
vote
0
answers
237
views
Hard Deletes in Delta Live Tables (DLT)
How are folks handling hard deletes in their Delta Live Table pipelines? I am working with the source team to see about getting them to update their processes to provide a change log but for right now,...
5
votes
1
answer
1k
views
Handling Incremental Data Loading and SCD Type 2 for joined tables in Delta Live Tables on Databricks
I'm working on a project utilizing Delta Live Tables on Databricks, where I need to create a dimension (Kimball style) with slowly changing dimension type 2. The dimension is the result of a join ...
0
votes
1
answer
913
views
Databricks - How to avoid duplicate records in Delta Tables
We have one use case in our data projects where feeds coming from source system via live streaming may detect certain issues and resend the same transaction again with a flag that indicates that ...
1
vote
2
answers
556
views
How to add comments to a Delta Table in Scala?
I would like to add comments to columns of an existing Delta table, without having to actually write SQL statements like "ALTER TABLE ALTER COLUMN". Is it possible to do it using only Scala?
0
votes
0
answers
87
views
Databricks: is there a way to forward incoming records over to SQL Server?
We are building a "typical" Medallion Architecture delta lake using Azure Databricks.
We have a business requirement to forward the incoming records over to a SQL Server instance as soon as ...
1
vote
0
answers
117
views
Running Delta Live Tables using Mosaic
I'm just learning to use mosaic and delta live table in Databricks. I was following this example https://github.com/databrickslabs/mosaic/tree/main/notebooks/examples/python/OpenStreetMaps and after ...
0
votes
2
answers
930
views
Executing Spark sql in delta live tables
I am new to DLT and trying to get a hang of it. I have written the below code. I have two streaming tables (temp1 and temp2). I am creating two views out of those tables. I am then joining those views ...