0

I would like to use Airflow for some ETL operations, where the source data does not have (indexed) timestamp columns. The source data is a database table where new records of events are appended continuously, with an ever growing incremental integer as a primary key.

My immediate solution would be to keep a tab on the primary key of the last row extracted, and use that as starting point for the next instance of the data extraction task.

But it appears that Airflow's tasks are designed around the logical date ({{ logical_date }}, {{ data_interval_start }} and {{ data_interval_start }}), not an integer offset.

How could I go about siphoning rows from a database table, when the data natively does not have a datetime aspect to partition by?

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.