I would like to use Airflow for some ETL operations, where the source data does not have (indexed) timestamp columns. The source data is a database table where new records of events are appended continuously, with an ever growing incremental integer as a primary key.
My immediate solution would be to keep a tab on the primary key of the last row extracted, and use that as starting point for the next instance of the data extraction task.
But it appears that Airflow's tasks are designed around the logical date ({{ logical_date }}, {{ data_interval_start }} and {{ data_interval_start }}), not an integer offset.
How could I go about siphoning rows from a database table, when the data natively does not have a datetime aspect to partition by?