Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
18 views

The Architecture: I am implementing a Delta Live Tables (DLT) pipeline following the Medallion architecture. Landing: Auto Loader ingesting raw files (JSON/CSV). Bronze Layer: Uses dlt.apply_changes() ...
Jostein Sortland's user avatar
0 votes
1 answer
40 views

I inherited a custom Python library and a Databricks instance that I haven't had to do much with, but this last week I had to make changes to a function in the codebase. I thought Databricks was ...
user2745258's user avatar
0 votes
1 answer
47 views

I read data from Databricks import pandas as pd import joblib query = 'select * from table a" df = spark.sql(query) df = df.toPandas() df.to_pickle('df.pickle') joblib.dump(df, 'df.joblib') ...
user1700890's user avatar
  • 7,824
0 votes
0 answers
60 views

I am running DBT models on Databricks and I am starting to implement unit tests for them. I have the following DBT unit test : unit_tests: - name: test_my_model model: my_model given: ...
Nakeuh's user avatar
  • 1,931
0 votes
2 answers
66 views

I'd like to export data from tables within my Databricks Unity Catalog. I'd like to transform each of the tables to a single parquet file which I can download. I thought I just write a table to a ...
the_economist's user avatar
Advice
1 vote
1 replies
45 views

I need some advise. I have a job which is running everyday and I'm looking to have a particular task to run on the second working day of the month. I know I can solve this by setting up another job ...
TheDefiant89's user avatar
0 votes
1 answer
61 views

When using Databricks runtime 16.4, I am trying to set spark.scheduler.allocation.file to a location in a workspace. config("spark.scheduler.allocation.file","file:/Workspace/init/...
Frank's user avatar
  • 636
0 votes
1 answer
57 views

NOTE: I am running this query on Azure Databricks in a serverless Notebook. I have two tables with identical schema: foo and bar. They have the same number of columns, with the same names, in the same ...
Adam's user avatar
  • 4,236
0 votes
1 answer
47 views

I would like to know the total size of a table, as well as the file sizes of the files that comprise it. Using describe detail works DESCRIBE DETAIL table1, but using the information as a table doesn'...
Climbs_lika_Spyder's user avatar
1 vote
1 answer
247 views

What could be a cause of the following error of my code in a Databricks notebook, and how can we fix the error? ImportError: cannot import name 'pipelines' from 'pyspark' (/databricks/python/lib/...
nam's user avatar
  • 24.2k
0 votes
0 answers
35 views

I'm analyzing Spark event logs and have already retrieved the SparkListenerStageSubmitted and SparkListenerTaskEnd events to collect metrics such as spill, skew ratio, memory, and CPU usage. However, ...
Carol C's user avatar
2 votes
2 answers
102 views

As part of a function I create df1 and df2 and aim to stack them and output the results. But the results do not display within the function, nor if I output the results and display after. results = ...
platyfish800's user avatar
1 vote
1 answer
90 views

I´ve created a schema in DAB with this code in my yml file. resources: schemas: my_schema: name: my_schema_name catalog_name: my_catalog The schema is created ...
Siggerud's user avatar
0 votes
1 answer
78 views

I want to change my spark session from 'pyspark.sql.connect.dataframe.DataFrame' to 'pyspark.sql.dataframe.DataFrame' so that I can run StringIndexer and VectorAssembler. If I run it in pyspark.sql....
Nalini Panwar's user avatar
0 votes
0 answers
33 views

I have a PySpark function that reads a reference CSV file inside a larger ETL pipeline. On my personal Databricks cluster, this works fine. On the group cluster, it return empty dataframe, the same ...
Codie's user avatar
  • 1
0 votes
0 answers
75 views

I’m currently exploring Inference Services in Snowflake and wanted to check if there’s an equivalent to the Event History column in Databricks. So far, the closest I’ve found in Snowflake are service ...
Symantic's user avatar
0 votes
1 answer
62 views

I have the below pydantic model with 6 columns out of which 2 columns are nullable. from pydantic import BaseModel from typing import Optional class Purchases(BaseModel): customer_id: int ...
LearneR's user avatar
  • 2,593
0 votes
0 answers
30 views

I have been successfully creating DeltaTables in ADLS Gen2 for a number of years without any issues. Today, I deleted the deltaTable for a table I copied into ADLS Gen 2 with ADF and the associated ...
Patterson's user avatar
  • 3,011
1 vote
0 answers
50 views

I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in ...
MattSt's user avatar
  • 1,203
0 votes
1 answer
58 views

Normally I run this code at the top of notebooks to allow printout of multiple outputs from a cell (without having to use print statements. from IPython.core.interactiveshell import InteractiveShell ...
another_summer's user avatar
0 votes
1 answer
105 views

I'm getting an error of [DELTA_INSERT_COLUMN_ARITY_MISMATCH] while trying to insert into DataBricks using PYODBC. If I run this query, everything works fine in both DataBricks and Python ‘INSERT INTO ...
Jason Wiest's user avatar
0 votes
0 answers
69 views

I am trying to establish a connection to our Azure Data Lake Gen2 using a SAS Token. I have created the following SAS token spark.conf.set("fs.azure.account.auth.type.adlsprexxxxx.dfs.core....
Patterson's user avatar
  • 3,011
0 votes
0 answers
129 views

I want to know/monitor which version of the delta table is currently being processed, especially when the stream is started with a startingVersion. My understanding is when that option is chosen, the ...
Saugat Mukherjee's user avatar
0 votes
1 answer
112 views

I am working on Azure Databricks Test Automation using Java. There are a number of Jobs and pipelines that are created in Azure Databricks to process data. I want to create WorkspaceClient for them ...
ashish chauhan's user avatar
1 vote
1 answer
128 views

I have following Code. It reads a pre-existing file for a ML model. I am trying to run it on databricks on multiple cases import numpy as np import joblib class WeightedEnsembleRegressor: "&...
user6386155's user avatar

1
2 3 4 5
170