I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in those posts (e.g. this one), the error was related to the destination table already being registered as managed.
In my case, it appears that the error is related to the event log table associated with the AutoLoader. More speciifcally, if I look for the storage path in the error using the following query, I get a single table created automatically, called event_log_a3c015c9_f373_4aa6_92db_6b56ae0dc948:
SELECT
table_name
FROM system.information_schema.tables
WHERE table_name LIKE '%event%' and storage_path LIKE '%3775a194-3db0-48a6-8c0e-cce43c26c9e7%'
I tried re-creating the pipeline but it didn't help. Any idea how to resolve this?
Error:
AnalysisException: Traceback (most recent call last):
File "/Users/[email protected]/.bundle/Testproject_2/dev/files/src/notebook", cell 4, line 11
2 csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/dummy.csv"
3 schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/"
4 df = (
5 session.readStream
6 .format("cloudFiles")
7 .option("cloudFiles.format", "csv")
8 .option("header", "true")
9 .option("inferSchema", "true")
10 .option("cloudFiles.schemaLocation", schema_location)
---> 11 .load(csv_file_path)
12 )
AnalysisException: [RequestId=3ef8b745-48dc-4ae1-b2f6-9afaaf442c3b ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://[email protected]/dev-data-domain/__unitystorage/catalogs/cf3123b2-b661-48d9-9baa-a0b0214d5a29/tables/3775a194-3db0-48a6-8c0e-cce43c26c9e7/_dlt_metadata/_autoloader' overlaps with managed storage within 'CheckPathAccess' call. .
Relevant code:
from databricks.connect import DatabricksSession
from pyspark.sql.functions import *
# Create or retrieve a DatabricksSession
session = DatabricksSession.builder.getOrCreate()
csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/dummy.csv"
schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/"
df = (
session.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("header", "true")
.option("inferSchema", "true")
.option("cloudFiles.schemaLocation", schema_location)
.load(csv_file_path)
)
checkpoint_path = "/Volumes/dev-data-domain/bronze/test/_checkpoint5"
query = (
df.writeStream
.format("delta")
.option("checkpointLocation", checkpoint_path)
.outputMode("append")
.trigger(once=True)
.toTable("`dev-data-domain`.bronze.delta_table_pipeline3")
)