Newest 'databricks' Questions

0 votes

0 answers

18 views

How to stream from an merged layer (apply_changes) table into a downstream silver layer as stream and not Materialized View (MV)?

The Architecture: I am implementing a Delta Live Tables (DLT) pipeline following the Medallion architecture. Landing: Auto Loader ingesting raw files (JSON/CSV). Bronze Layer: Uses dlt.apply_changes() ...

Jostein Sortland

309

asked yesterday

0 votes

1 answer

40 views

Update Python Library in Databricks Cluster

I inherited a custom Python library and a Databricks instance that I haven't had to do much with, but this last week I had to make changes to a function in the codebase. I thought Databricks was ...

user2745258

57

asked Nov 18 at 22:03

0 votes

1 answer

47 views

No module named 'pyspark.sql.metrics' when working with pickle or joblib on Databrick

I read data from Databricks import pandas as pd import joblib query = 'select * from table a" df = spark.sql(query) df = df.toPandas() df.to_pickle('df.pickle') joblib.dump(df, 'df.joblib') ...

user1700890

7,824

asked Nov 13 at 18:51

0 votes

0 answers

60 views

DBT unit tests fail when a struct has too many fields

I am running DBT models on Databricks and I am starting to implement unit tests for them. I have the following DBT unit test : unit_tests: - name: test_my_model model: my_model given: ...

Nakeuh

1,931

asked Nov 12 at 10:14

0 votes

2 answers

66 views

How to write parquet file to Databricks Volume?

I'd like to export data from tables within my Databricks Unity Catalog. I'd like to transform each of the tables to a single parquet file which I can download. I thought I just write a table to a ...

the_economist

579

asked Nov 10 at 14:55

Advice

1 vote

1 replies

45 views

Job Task Conditions - Only Run on 2nd Working Day of the Month

I need some advise. I have a job which is running everyday and I'm looking to have a particular task to run on the second working day of the month. I know I can solve this by setting up another job ...

TheDefiant89

45

asked Nov 3 at 12:28

0 votes

1 answer

61 views

Location of spark.scheduler.allocation.file in Databricks workspace

When using Databricks runtime 16.4, I am trying to set spark.scheduler.allocation.file to a location in a workspace. config("spark.scheduler.allocation.file","file:/Workspace/init/...

Frank

636

asked Oct 30 at 19:02

0 votes

1 answer

57 views

Union Two Datasets Causes Records to Unexpectedly Filter

NOTE: I am running this query on Azure Databricks in a serverless Notebook. I have two tables with identical schema: foo and bar. They have the same number of columns, with the same names, in the same ...

Adam

4,236

asked Oct 28 at 11:28

0 votes

1 answer

47 views

How do I find the file size for my Delta tables in Databricks? I want to be able to expand it to multiple tables

I would like to know the total size of a table, as well as the file sizes of the files that comprise it. Using describe detail works DESCRIBE DETAIL table1, but using the information as a table doesn'...

Climbs_lika_Spyder

6,832

asked Oct 24 at 13:31

1 vote

1 answer

247 views

Unable to import pyspark.pipelines module

What could be a cause of the following error of my code in a Databricks notebook, and how can we fix the error? ImportError: cannot import name 'pipelines' from 'pyspark' (/databricks/python/lib/...

nam

24.2k

asked Oct 17 at 19:09

0 votes

0 answers

35 views

How to link Spark event log stages to PySpark code or query?

I'm analyzing Spark event logs and have already retrieved the SparkListenerStageSubmitted and SparkListenerTaskEnd events to collect metrics such as spill, skew ratio, memory, and CPU usage. However, ...

Carol C

1

asked Oct 9 at 19:40

2 votes

2 answers

102 views

Union of tiny dataframes exhausts resource on Databricks

As part of a function I create df1 and df2 and aim to stack them and output the results. But the results do not display within the function, nor if I output the results and display after. results = ...

platyfish800

33

asked Oct 8 at 10:49

1 vote

1 answer

90 views

Retrieve schema name created with databricks asset bundles

I´ve created a schema in DAB with this code in my yml file. resources: schemas: my_schema: name: my_schema_name catalog_name: my_catalog The schema is created ...

Siggerud

33

asked Oct 3 at 10:34

0 votes

1 answer

78 views

How can i change my Spark session in Databricks Community Edition?

I want to change my spark session from 'pyspark.sql.connect.dataframe.DataFrame' to 'pyspark.sql.dataframe.DataFrame' so that I can run StringIndexer and VectorAssembler. If I run it in pyspark.sql....

Nalini Panwar

27

asked Sep 30 at 6:46

0 votes

0 answers

33 views

Databricks group cluster fails to read CSV (TextFileFormatEdge$.disabled) while personal cluster works

I have a PySpark function that reads a reference CSV file inside a larger ETL pipeline. On my personal Databricks cluster, this works fine. On the group cluster, it return empty dataframe, the same ...

Codie

1

asked Sep 25 at 14:32

0 votes

0 answers

75 views

Can we get access history for an Inference Service in Snowflake?

I’m currently exploring Inference Services in Snowflake and wanted to check if there’s an equivalent to the Event History column in Databricks. So far, the closest I’ve found in Snowflake are service ...

Symantic

1

asked Sep 21 at 18:54

0 votes

1 answer

62 views

Pydantic model inserts None values in Databricks Delta table as string type instead of null type

I have the below pydantic model with 6 columns out of which 2 columns are nullable. from pydantic import BaseModel from typing import Optional class Purchases(BaseModel): customer_id: int ...

LearneR

2,593

asked Sep 19 at 12:52

0 votes

0 answers

30 views

No longer able to create a Delta Table in ADLS Gen 2. Error: he protocol of your Delta table could not be recovered while Reconstructing version: 0

I have been successfully creating DeltaTables in ADLS Gen2 for a number of years without any issues. Today, I deleted the deltaTable for a table I copied into ADLS Gen 2 with ADF and the associated ...

Patterson

3,011

asked Sep 17 at 13:38

1 vote

0 answers

50 views

Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location

I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in ...

MattSt

1,203

asked Sep 12 at 7:33

0 votes

1 answer

58 views

Databricks multiple output from a cell

Normally I run this code at the top of notebooks to allow printout of multiple outputs from a cell (without having to use print statements. from IPython.core.interactiveshell import InteractiveShell ...

another_summer

1

asked Sep 10 at 8:39

0 votes

1 answer

105 views

DELTA_INSERT_COLUMN_ARITY_MISMATCH error while using PYODBC and DataBricks

I'm getting an error of [DELTA_INSERT_COLUMN_ARITY_MISMATCH] while trying to insert into DataBricks using PYODBC. If I run this query, everything works fine in both DataBricks and Python ‘INSERT INTO ...

Jason Wiest

3

asked Aug 28 at 12:03

0 votes

0 answers

69 views

Unable to Create an Azure SAS Token to Be Used with Databricks to Connect to Azure ADLS Gen 2

I am trying to establish a connection to our Azure Data Lake Gen2 using a SAS Token. I have created the following SAS token spark.conf.set("fs.azure.account.auth.type.adlsprexxxxx.dfs.core....

Patterson

3,011

asked Aug 16 at 10:41

0 votes

0 answers

129 views

Which version of source delta table table currently being processed by spark structured streaming?

I want to know/monitor which version of the delta table is currently being processed, especially when the stream is started with a startingVersion. My understanding is when that option is chosen, the ...

Saugat Mukherjee

1,070

asked Aug 14 at 8:45

0 votes

1 answer

112 views

How to set proxy to create WorkspaceClient in Databricks using Java SDK

I am working on Azure Databricks Test Automation using Java. There are a number of Jobs and pipelines that are created in Azure Databricks to process data. I want to create WorkspaceClient for them ...

ashish chauhan

385

asked Aug 12 at 15:20

1 vote

1 answer

128 views

How do I load joblib file on spark?

I have following Code. It reads a pre-existing file for a ML model. I am trying to run it on databricks on multiple cases import numpy as np import joblib class WeightedEnsembleRegressor: "&...

user6386155

885

asked Aug 12 at 12:40

Collectives™ on Stack Overflow

How to stream from an merged layer (apply_changes) table into a downstream silver layer as stream and not Materialized View (MV)?

Update Python Library in Databricks Cluster

No module named 'pyspark.sql.metrics' when working with pickle or joblib on Databrick

DBT unit tests fail when a struct has too many fields

How to write parquet file to Databricks Volume?

Job Task Conditions - Only Run on 2nd Working Day of the Month

Location of spark.scheduler.allocation.file in Databricks workspace

Union Two Datasets Causes Records to Unexpectedly Filter

How do I find the file size for my Delta tables in Databricks? I want to be able to expand it to multiple tables

Unable to import pyspark.pipelines module

How to link Spark event log stages to PySpark code or query?

Union of tiny dataframes exhausts resource on Databricks

Retrieve schema name created with databricks asset bundles

How can i change my Spark session in Databricks Community Edition?

Databricks group cluster fails to read CSV (TextFileFormatEdge$.disabled) while personal cluster works

Can we get access history for an Inference Service in Snowflake?

Pydantic model inserts None values in Databricks Delta table as string type instead of null type

No longer able to create a Delta Table in ADLS Gen 2. Error: he protocol of your Delta table could not be recovered while Reconstructing version: 0

Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location

Databricks multiple output from a cell

DELTA_INSERT_COLUMN_ARITY_MISMATCH error while using PYODBC and DataBricks

Unable to Create an Azure SAS Token to Be Used with Databricks to Connect to Azure ADLS Gen 2

Which version of source delta table table currently being processed by spark structured streaming?

How to set proxy to create WorkspaceClient in Databricks using Java SDK

How do I load joblib file on spark?

Hot Network Questions