Newest 'etl' Questions

0 votes

1 answer

72 views

Why are my data quality validation rules not triggering for null values in my dataset?

I’m working on a data quality workflow where I validate incoming records for null or missing values. Even when a column clearly contains nulls, my rule doesn’t trigger and the record passes validation....

Neha Upadhyay

1

asked Nov 17 at 12:55

1 vote

1 answer

62 views

Why aren’t my changes reflected after modifying and reimporting an IBM DataStage job XML export?

I’m trying to programmatically modify IBM DataStage jobs to add a new database connector stage in parallel to an existing Database stage. Here’s my workflow: Export a job from DataStage Designer as ...

techguy11

11

asked Oct 26 at 18:52

-4 votes

1 answer

64 views

Programmatically modifying IBM DataStage job XML – changes not reflected after reimport [closed]

I’m trying to programmatically add a new database stage in parallel to an existing DataStage job by modifying its exported XML. I export the job from DataStage Designer, modify the XML via a Python ...

DataEngineer03

1

asked Oct 24 at 11:29

0 votes

0 answers

137 views

Unable to start worker on prefect - httpx.connecterror: all connection attempts failed

I have started prefect server on Remote Desktop using prefect server start —-host 0.0.0.0 —-port 8080 After this I am able to access the UI from different computers present on this network. I create a ...

Anzar

35

asked Sep 11 at 15:09

0 votes

1 answer

99 views

Can't connect to Ollama hosted locally from python script

I am building ETL using LLM to extract some information. I have ollama installed locally. I am on Macbook M4 Max. I don't understand why I have this error from my worker. ads-worker-1 | 2025-08-28 15:...

Mael Fosso

400

asked Aug 28 at 15:24

0 votes

0 answers

52 views

Error loading data: 'Engine' object has no attribute 'cursor': chan="stdout": source="task"

I am trying to run a batch process using Apache Airflow. The Extract and Transform stages work very fine but the load stages is giving an error. Here is my code: from airflow.decorators import dag, ...

Nwaogu Eziuche

19

asked Aug 17 at 2:06

1 vote

1 answer

76 views

How to control execution order inside a single SSIS Data Flow Task?

I'm working on a project where I need to extract data from an EDI file and load it into a database. I'm using Cozyroc’s “EDI Source” component for this. The EDI Source produces three data outputs, and ...

Darwin Palma

13

asked Jul 3 at 21:57

0 votes

0 answers

75 views

ADO.NET failure to acquire connection in an SQL Server to PostgreSQL ETL (DTS)

I'm creating in VisualStudio for Application 2022 an ETL that is migrating data from a MS-SQL Server table to another in PostgreSQL DataBase. I create the ADO.NET Destination connection (server IP, ...

Honorius

1

asked Jun 27 at 22:06

-1 votes

1 answer

94 views

How to Use Filter Activity Output as a Source in Copy Activity in Azure Data Factory Pipeline

I'm fairly new to Azure Data Factory and need help with a pipeline I'm building. My goal is to read data from a CSV file stored in an Amazon S3 bucket, filter out records where the Status column is '...

Tarun Sahu

7

asked Jun 5 at 13:00

2 votes

1 answer

218 views

Docker based local postgres database gives disk space error during data population, however there appears to be plenty of space

I'm at a bit of a loss here. I'm running a PostgreSQL Database on Docker on my Mac. df -h shows that my root volume has 236 GB available. docker system df shows that only a few gigs are being used by ...

Brandon Rickman

23

asked May 23 at 22:44

0 votes

1 answer

175 views

ClickHouse: read a parquet file that contains columns with None or NaN

I observe that I cannot open a parquet file with ClickHouse if it contains a column that contains only None or NaNs. My goal is to dump my raw files in my data warehouse, without having to define data ...

Adrien Pacifico

2,179

asked May 22 at 9:31

0 votes

1 answer

41 views

Move files based on size in Pentaho

Within Pentaho, how do I move files from a particular local directory that are 1 KB in size to another folder? Ideally, I'd like to move all 1 KB files at once and not go one by one (we're talking ...

slybitz

719

asked May 13 at 17:59

0 votes

1 answer

37 views

How to do transaction control transformation without the static target file?

I just developed my ETL using informatica powercenter and I used transaction control transformation to have dynamic output files and it works successfully. This is the mapping logic My problem is that ...

Toqa

3

asked May 1 at 8:14

1 vote

2 answers

326 views

dlthub not creating tables in Oracle Database

I'm having problems on running pipelines using dlthub, using a oracle database as destination. import dlt import requests import sqlalchemy as sa pipeline = dlt.pipeline( pipeline_name="...

Rafael Nobre

71

asked Apr 28 at 20:36

0 votes

0 answers

59 views

Azure Data Factory / Data Flow, how to extract data from JSON where the ids are the keys?

In an Azure Data Factory Data Flow I am using a REST endpoint as the data source to get a JSON of data. However the data arrives in a strange format, it is a dictionary of keys where the key value is ...

Jack

1

asked Apr 22 at 14:36

1 vote

0 answers

105 views

How to handle evolving Parquet schemas from GCS when loading into BigQuery?

We are designing a data ingestion pipeline where Parquet files are delivered weekly into a GCS bucket. The bucket structure is: gs://my-bucket/YYYY/MM/DD/<instance-version>/<instance-id>/&...

dadadima

958

asked Apr 10 at 9:23

1 vote

0 answers

126 views

following warning: APT_CombinedOperatorController,0: Conversion error while calling the date_from_ustring conversion routine. Data may have been lost

Good morning everyone, I'm currently receiving the following warning in IBM Datastage: APT_CombinedOperatorController,0: Conversion error while calling the date_from_ustring conversion routine. Data ...

newtime technology

11

asked Apr 8 at 18:41

0 votes

0 answers

79 views

Hubspot private app extract ETL via Azure Data Factory

I have a private app with in hubspot. Using the following API it is supposed to return the data that is in the contacts table in the CRM: https://api.hubapi.com/crm/v3/objects/contacts?Authorization=...

Gav Cheal

67

asked Apr 3 at 10:49

0 votes

0 answers

78 views

Failure to authenticate a SharePoint connection with AuthenticationContext

I have the following python code in my Matillion PythonScript component: from office365.sharepoint.client_context import ClientContext from office365.runtime.auth.authentication_context import ...

David Makovoz

1,938

asked Mar 26 at 19:26

0 votes

0 answers

65 views

Best Practices for Uploading Parquet Files into a Predefined BigQuery Schema (Avoiding Type Mismatches)

I'm uploading large datasets into BigQuery using Parquet files instead of CSVs (due to a 100MB limit). However, when loading a Parquet file into a predefined schema, I encounter errors like: Field ID ...

mowen10

393

asked Mar 19 at 16:12

0 votes

0 answers

35 views

Airflow Tasks State Suddenly Clearing up on Webserver UI

I am currently using Airflow to run some SQL tasks on Clickhouse. Airflow is run within Docker containers on a Compute instance in GCP, and the tasks are generally provisioned using the Airflow ...

Ebube Okoli

13

asked Mar 10 at 8:09

2 votes

1 answer

122 views

Run a single DELETE FROM SQL command based on a RecordSet of multiple rows

I have a CSV file with a single column, id, and 3 rows: 10, 20, 30. What I want to do is simply delete the rows from a table in a database where the table's id col contains any of the three values. ...

RedAero

109

asked Mar 5 at 1:29

-1 votes

1 answer

57 views

Leaving message unacknowledged in Benthos job with gcp_pubsub input

How does Benthos handle the acknowledgement of pubsub messages? How can we manage ack/unack based on custom if-else conditions? Here is the scenario i'm trying to achieve: I have written a Benthos job ...

Tarun Kumar

1

asked Feb 23 at 22:07

0 votes

1 answer

52 views

DynamoDb cross-account migration using Glue Job, GSI value cannot be null

When I'm using Glue Job for cross-account migration in DynamoDB, I need to transform the PK and GSI values by adding a fixed prefix like 'XXX'. However, if the gsi0_pk value from the source table is ...

Jie Zhao

1

asked Feb 19 at 2:37

0 votes

0 answers

67 views

Efficiently Migrating 40TB of BLOB Data from Oracle to a Scalable System

I have a problem: I need to migrate 40TB of data from Oracle database (in .dmp format file) to my company database . The database contains only one table with two columns—one for the ID and the other ...

trinh lap

3

asked Feb 17 at 8:57

Collectives™ on Stack Overflow

Why are my data quality validation rules not triggering for null values in my dataset?

Why aren’t my changes reflected after modifying and reimporting an IBM DataStage job XML export?

Programmatically modifying IBM DataStage job XML – changes not reflected after reimport [closed]

Unable to start worker on prefect - httpx.connecterror: all connection attempts failed

Can't connect to Ollama hosted locally from python script

Error loading data: 'Engine' object has no attribute 'cursor': chan="stdout": source="task"

How to control execution order inside a single SSIS Data Flow Task?

ADO.NET failure to acquire connection in an SQL Server to PostgreSQL ETL (DTS)

How to Use Filter Activity Output as a Source in Copy Activity in Azure Data Factory Pipeline

Docker based local postgres database gives disk space error during data population, however there appears to be plenty of space

ClickHouse: read a parquet file that contains columns with None or NaN

Move files based on size in Pentaho

How to do transaction control transformation without the static target file?

dlthub not creating tables in Oracle Database

Azure Data Factory / Data Flow, how to extract data from JSON where the ids are the keys?

How to handle evolving Parquet schemas from GCS when loading into BigQuery?

following warning: APT_CombinedOperatorController,0: Conversion error while calling the date_from_ustring conversion routine. Data may have been lost

Hubspot private app extract ETL via Azure Data Factory

Failure to authenticate a SharePoint connection with AuthenticationContext

Best Practices for Uploading Parquet Files into a Predefined BigQuery Schema (Avoiding Type Mismatches)

Airflow Tasks State Suddenly Clearing up on Webserver UI

Run a single DELETE FROM SQL command based on a RecordSet of multiple rows

Leaving message unacknowledged in Benthos job with gcp_pubsub input

DynamoDb cross-account migration using Glue Job, GSI value cannot be null

Efficiently Migrating 40TB of BLOB Data from Oracle to a Scalable System

Hot Network Questions