Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location

Ask Question

Asked 2 months ago

Modified 2 months ago

Viewed 50 times

Part of Microsoft Azure Collective

I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in those posts (e.g. this one), the error was related to the destination table already being registered as managed.

In my case, it appears that the error is related to the event log table associated with the AutoLoader. More speciifcally, if I look for the storage path in the error using the following query, I get a single table created automatically, called event_log_a3c015c9_f373_4aa6_92db_6b56ae0dc948:

SELECT 
  table_name
FROM system.information_schema.tables
WHERE table_name LIKE '%event%' and storage_path LIKE '%3775a194-3db0-48a6-8c0e-cce43c26c9e7%'

I tried re-creating the pipeline but it didn't help. Any idea how to resolve this?

Error:

AnalysisException: Traceback (most recent call last):
File "/Users/[email protected]/.bundle/Testproject_2/dev/files/src/notebook", cell 4, line 11
      2 csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/dummy.csv"
      3 schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/"
      4 df = (
      5     session.readStream
      6     .format("cloudFiles")
      7     .option("cloudFiles.format", "csv")
      8     .option("header", "true")
      9     .option("inferSchema", "true")
     10     .option("cloudFiles.schemaLocation", schema_location)
---> 11     .load(csv_file_path)
     12 )

AnalysisException: [RequestId=3ef8b745-48dc-4ae1-b2f6-9afaaf442c3b ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://[email protected]/dev-data-domain/__unitystorage/catalogs/cf3123b2-b661-48d9-9baa-a0b0214d5a29/tables/3775a194-3db0-48a6-8c0e-cce43c26c9e7/_dlt_metadata/_autoloader' overlaps with managed storage within 'CheckPathAccess' call. .

Relevant code:

from databricks.connect import DatabricksSession
from pyspark.sql.functions import *

# Create or retrieve a DatabricksSession
session = DatabricksSession.builder.getOrCreate()


csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/dummy.csv"
schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/"
df = (
    session.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .option("cloudFiles.schemaLocation", schema_location)
    .load(csv_file_path)
)

checkpoint_path = "/Volumes/dev-data-domain/bronze/test/_checkpoint5"  

query = (
    df.writeStream
    .format("delta")
    .option("checkpointLocation", checkpoint_path)
    .outputMode("append")
    .trigger(once=True)
    .toTable("`dev-data-domain`.bronze.delta_table_pipeline3")
)

asked Sep 12 at 7:33

MattSt

1,2032 gold badges18 silver badges42 bronze badges

1

Double check if you are calling autoloader from @dlt.table annotated function. The error you are getting is exactly what I was seeing when calling it from outside.

arturro
– arturro

2025-09-14 18:41:58 +00:00
Commented Sep 14 at 18:41
Using DLT helped me resolve the issue, but I guess that is because it manages checkpoints and schema locations automatically. That helped me move forward so thank you, but I still wonder why my example didn't work. I should still be able to read with AutoLoader without using Delta Live Tables.

MattSt
– MattSt

2025-09-15 09:35:30 +00:00
Commented Sep 15 at 9:35

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest