Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
72 views

I’m working on a data quality workflow where I validate incoming records for null or missing values. Even when a column clearly contains nulls, my rule doesn’t trigger and the record passes validation....
Neha Upadhyay's user avatar
1 vote
1 answer
62 views

I’m trying to programmatically modify IBM DataStage jobs to add a new database connector stage in parallel to an existing Database stage. Here’s my workflow: Export a job from DataStage Designer as ...
techguy11's user avatar
-4 votes
1 answer
64 views

I’m trying to programmatically add a new database stage in parallel to an existing DataStage job by modifying its exported XML. I export the job from DataStage Designer, modify the XML via a Python ...
DataEngineer03's user avatar
0 votes
0 answers
137 views

I have started prefect server on Remote Desktop using prefect server start —-host 0.0.0.0 —-port 8080 After this I am able to access the UI from different computers present on this network. I create a ...
Anzar's user avatar
  • 35
0 votes
1 answer
99 views

I am building ETL using LLM to extract some information. I have ollama installed locally. I am on Macbook M4 Max. I don't understand why I have this error from my worker. ads-worker-1 | 2025-08-28 15:...
Mael Fosso's user avatar
0 votes
0 answers
52 views

I am trying to run a batch process using Apache Airflow. The Extract and Transform stages work very fine but the load stages is giving an error. Here is my code: from airflow.decorators import dag, ...
Nwaogu Eziuche's user avatar
1 vote
1 answer
76 views

I'm working on a project where I need to extract data from an EDI file and load it into a database. I'm using Cozyroc’s “EDI Source” component for this. The EDI Source produces three data outputs, and ...
Darwin Palma's user avatar
0 votes
0 answers
75 views

I'm creating in VisualStudio for Application 2022 an ETL that is migrating data from a MS-SQL Server table to another in PostgreSQL DataBase. I create the ADO.NET Destination connection (server IP, ...
Honorius's user avatar
-1 votes
1 answer
94 views

I'm fairly new to Azure Data Factory and need help with a pipeline I'm building. My goal is to read data from a CSV file stored in an Amazon S3 bucket, filter out records where the Status column is '...
Tarun Sahu's user avatar
2 votes
1 answer
218 views

I'm at a bit of a loss here. I'm running a PostgreSQL Database on Docker on my Mac. df -h shows that my root volume has 236 GB available. docker system df shows that only a few gigs are being used by ...
Brandon Rickman's user avatar
0 votes
1 answer
175 views

I observe that I cannot open a parquet file with ClickHouse if it contains a column that contains only None or NaNs. My goal is to dump my raw files in my data warehouse, without having to define data ...
Adrien Pacifico's user avatar
0 votes
1 answer
41 views

Within Pentaho, how do I move files from a particular local directory that are 1 KB in size to another folder? Ideally, I'd like to move all 1 KB files at once and not go one by one (we're talking ...
slybitz's user avatar
  • 719
0 votes
1 answer
37 views

I just developed my ETL using informatica powercenter and I used transaction control transformation to have dynamic output files and it works successfully. This is the mapping logic My problem is that ...
Toqa's user avatar
  • 3
1 vote
2 answers
326 views

I'm having problems on running pipelines using dlthub, using a oracle database as destination. import dlt import requests import sqlalchemy as sa pipeline = dlt.pipeline( pipeline_name="...
Rafael Nobre's user avatar
0 votes
0 answers
59 views

In an Azure Data Factory Data Flow I am using a REST endpoint as the data source to get a JSON of data. However the data arrives in a strange format, it is a dictionary of keys where the key value is ...
Jack's user avatar
  • 1
1 vote
0 answers
105 views

We are designing a data ingestion pipeline where Parquet files are delivered weekly into a GCS bucket. The bucket structure is: gs://my-bucket/YYYY/MM/DD/<instance-version>/<instance-id>/&...
dadadima's user avatar
  • 958
1 vote
0 answers
126 views

Good morning everyone, I'm currently receiving the following warning in IBM Datastage: APT_CombinedOperatorController,0: Conversion error while calling the date_from_ustring conversion routine. Data ...
newtime technology's user avatar
0 votes
0 answers
79 views

I have a private app with in hubspot. Using the following API it is supposed to return the data that is in the contacts table in the CRM: https://api.hubapi.com/crm/v3/objects/contacts?Authorization=...
Gav Cheal's user avatar
0 votes
0 answers
78 views

I have the following python code in my Matillion PythonScript component: from office365.sharepoint.client_context import ClientContext from office365.runtime.auth.authentication_context import ...
David Makovoz's user avatar
0 votes
0 answers
65 views

I'm uploading large datasets into BigQuery using Parquet files instead of CSVs (due to a 100MB limit). However, when loading a Parquet file into a predefined schema, I encounter errors like: Field ID ...
mowen10's user avatar
  • 393
0 votes
0 answers
35 views

I am currently using Airflow to run some SQL tasks on Clickhouse. Airflow is run within Docker containers on a Compute instance in GCP, and the tasks are generally provisioned using the Airflow ...
Ebube Okoli's user avatar
2 votes
1 answer
122 views

I have a CSV file with a single column, id, and 3 rows: 10, 20, 30. What I want to do is simply delete the rows from a table in a database where the table's id col contains any of the three values. ...
RedAero's user avatar
  • 109
-1 votes
1 answer
57 views

How does Benthos handle the acknowledgement of pubsub messages? How can we manage ack/unack based on custom if-else conditions? Here is the scenario i'm trying to achieve: I have written a Benthos job ...
Tarun Kumar's user avatar
0 votes
1 answer
52 views

When I'm using Glue Job for cross-account migration in DynamoDB, I need to transform the PK and GSI values by adding a fixed prefix like 'XXX'. However, if the gsi0_pk value from the source table is ...
Jie Zhao's user avatar
0 votes
0 answers
67 views

I have a problem: I need to migrate 40TB of data from Oracle database (in .dmp format file) to my company database . The database contains only one table with two columns—one for the ID and the other ...
trinh lap's user avatar

1
2 3 4 5
120