5,963 questions
0
votes
1
answer
72
views
Why are my data quality validation rules not triggering for null values in my dataset?
I’m working on a data quality workflow where I validate incoming records for null or missing values.
Even when a column clearly contains nulls, my rule doesn’t trigger and the record passes validation....
1
vote
1
answer
62
views
Why aren’t my changes reflected after modifying and reimporting an IBM DataStage job XML export?
I’m trying to programmatically modify IBM DataStage jobs to add a new database connector stage in parallel to an existing Database stage.
Here’s my workflow:
Export a job from DataStage Designer as ...
-4
votes
1
answer
64
views
Programmatically modifying IBM DataStage job XML – changes not reflected after reimport [closed]
I’m trying to programmatically add a new database stage in parallel to an existing DataStage job by modifying its exported XML. I export the job from DataStage Designer, modify the XML via a Python ...
0
votes
0
answers
137
views
Unable to start worker on prefect - httpx.connecterror: all connection attempts failed
I have started prefect server on Remote Desktop using
prefect server start —-host 0.0.0.0 —-port 8080
After this I am able to access the UI from different computers present on this network. I create a ...
0
votes
1
answer
99
views
Can't connect to Ollama hosted locally from python script
I am building ETL using LLM to extract some information.
I have ollama installed locally. I am on Macbook M4 Max.
I don't understand why I have this error from my worker.
ads-worker-1 | 2025-08-28 15:...
0
votes
0
answers
52
views
Error loading data: 'Engine' object has no attribute 'cursor': chan="stdout": source="task"
I am trying to run a batch process using Apache Airflow. The Extract and Transform stages work very fine but the load stages is giving an error. Here is my code:
from airflow.decorators import dag, ...
1
vote
1
answer
76
views
How to control execution order inside a single SSIS Data Flow Task?
I'm working on a project where I need to extract data from an EDI file and load it into a database. I'm using Cozyroc’s “EDI Source” component for this.
The EDI Source produces three data outputs, and ...
0
votes
0
answers
75
views
ADO.NET failure to acquire connection in an SQL Server to PostgreSQL ETL (DTS)
I'm creating in VisualStudio for Application 2022 an ETL that is migrating data from a MS-SQL Server table to another in PostgreSQL DataBase. I create the ADO.NET Destination connection (server IP, ...
-1
votes
1
answer
94
views
How to Use Filter Activity Output as a Source in Copy Activity in Azure Data Factory Pipeline
I'm fairly new to Azure Data Factory and need help with a pipeline I'm building. My goal is to read data from a CSV file stored in an Amazon S3 bucket, filter out records where the Status column is '...
2
votes
1
answer
218
views
Docker based local postgres database gives disk space error during data population, however there appears to be plenty of space
I'm at a bit of a loss here.
I'm running a PostgreSQL Database on Docker on my Mac.
df -h shows that my root volume has 236 GB available.
docker system df shows that only a few gigs are being used by ...
0
votes
1
answer
175
views
ClickHouse: read a parquet file that contains columns with None or NaN
I observe that I cannot open a parquet file with ClickHouse if it contains a column that contains only None or NaNs.
My goal is to dump my raw files in my data warehouse, without having to define data ...
0
votes
1
answer
41
views
Move files based on size in Pentaho
Within Pentaho, how do I move files from a particular local directory that are 1 KB in size to another folder? Ideally, I'd like to move all 1 KB files at once and not go one by one (we're talking ...
0
votes
1
answer
37
views
How to do transaction control transformation without the static target file?
I just developed my ETL using informatica powercenter and I used transaction control transformation to have dynamic output files and it works successfully.
This is the mapping logic
My problem is that ...
1
vote
2
answers
326
views
dlthub not creating tables in Oracle Database
I'm having problems on running pipelines using dlthub, using a oracle database as destination.
import dlt
import requests
import sqlalchemy as sa
pipeline = dlt.pipeline(
pipeline_name="...
0
votes
0
answers
59
views
Azure Data Factory / Data Flow, how to extract data from JSON where the ids are the keys?
In an Azure Data Factory Data Flow I am using a REST endpoint as the data source to get a JSON of data. However the data arrives in a strange format, it is a dictionary of keys where the key value is ...
1
vote
0
answers
105
views
How to handle evolving Parquet schemas from GCS when loading into BigQuery?
We are designing a data ingestion pipeline where Parquet files are delivered weekly into a GCS bucket.
The bucket structure is:
gs://my-bucket/YYYY/MM/DD/<instance-version>/<instance-id>/&...
1
vote
0
answers
126
views
following warning: APT_CombinedOperatorController,0: Conversion error while calling the date_from_ustring conversion routine. Data may have been lost
Good morning everyone, I'm currently receiving the following warning in IBM Datastage:
APT_CombinedOperatorController,0: Conversion error while calling the date_from_ustring conversion routine. Data ...
0
votes
0
answers
79
views
Hubspot private app extract ETL via Azure Data Factory
I have a private app with in hubspot. Using the following API it is supposed to return the data that is in the contacts table in the CRM:
https://api.hubapi.com/crm/v3/objects/contacts?Authorization=...
0
votes
0
answers
78
views
Failure to authenticate a SharePoint connection with AuthenticationContext
I have the following python code in my Matillion PythonScript component:
from office365.sharepoint.client_context import ClientContext
from office365.runtime.auth.authentication_context import ...
0
votes
0
answers
65
views
Best Practices for Uploading Parquet Files into a Predefined BigQuery Schema (Avoiding Type Mismatches)
I'm uploading large datasets into BigQuery using Parquet files instead of CSVs (due to a 100MB limit). However, when loading a Parquet file into a predefined schema, I encounter errors like:
Field ID ...
0
votes
0
answers
35
views
Airflow Tasks State Suddenly Clearing up on Webserver UI
I am currently using Airflow to run some SQL tasks on Clickhouse. Airflow is run within Docker containers on a Compute instance in GCP, and the tasks are generally provisioned using the Airflow ...
2
votes
1
answer
122
views
Run a single DELETE FROM SQL command based on a RecordSet of multiple rows
I have a CSV file with a single column, id, and 3 rows: 10, 20, 30. What I want to do is simply delete the rows from a table in a database where the table's id col contains any of the three values. ...
-1
votes
1
answer
57
views
Leaving message unacknowledged in Benthos job with gcp_pubsub input
How does Benthos handle the acknowledgement of pubsub messages? How can we manage ack/unack based on custom if-else conditions?
Here is the scenario i'm trying to achieve:
I have written a Benthos job ...
0
votes
1
answer
52
views
DynamoDb cross-account migration using Glue Job, GSI value cannot be null
When I'm using Glue Job for cross-account migration in DynamoDB, I need to transform the PK and GSI values by adding a fixed prefix like 'XXX'. However, if the gsi0_pk value from the source table is ...
0
votes
0
answers
67
views
Efficiently Migrating 40TB of BLOB Data from Oracle to a Scalable System
I have a problem: I need to migrate 40TB of data from Oracle database (in .dmp format file) to my company database . The database contains only one table with two columns—one for the ID and the other ...