I have a dataframe in the following format:
| PID | Date1 | Date2 | Details |
|---|---|---|---|
| 17750A | 03/07/1960 | 06/07/2009 | A1B3 |
| 17758X | 03/07/1960 | 06/07/2009 | A1B3 |
| 06/09/1961 | 11/05/2013 | A2B2 | |
| 28363D | 20/11/1964 | 05/03/2019 | A1A2 |
| 30050A | 30/06/1961 | 18/07/2017 | A1B3 |
| 04/11/1961 | 16/10/2008 | A2B2 |
And an Oracle database with a table as follows:
| ID | DateA | DateB | Notes |
|---|---|---|---|
| 17750A | 03/07/1960 | 06/07/2009 | A1B3 |
| 03/07/1960 | 06/07/2009 | A1B3 | |
| 20964Q | 06/09/1961 | 11/05/2013 | A2B2 |
| 28363D | 20/11/1964 | 04/03/2019 | A1A2 |
| 30/06/1961 | 19/07/2017 | A1B3 | |
| 10832Q | 04/11/1961 | 17/10/2008 | A2B2 |
I need to query the database to return another df containing any record where the ID matches a PID, or where (Date1, Date2) equals (DateA, DateB) - i.e. both dates in a df row match both dates in a table row.
So far, I've managed to achieve the first, but not the second.
pid_list = df['PID'].values
nvars = ','.join(f':{i}' for i in range(len(pid_list)))
sql_query = """
SELECT
gd.ID,
gd.DateA,
gd.DateB,
gd.Notes
FROM table1 as gd
WHERE gd.ID in (%s)
""" % nvars
df_result = pd_read_sql(sql_query, connection, params=pid_list)
How can I expand that to also match on the pair of dates? Is there a way to do this by passing a list of tuples as a param, rather than needing to iterate through pairs of dates? Something like:
sql_query = """
SELECT
gd.ID,
gd.DateA,
gd.DateB,
gd.Notes
FROM table1 as gd
WHERE gd.pid in (%s)
OR (Date1 = DateA AND Date2 = DateB)
""" % nvars, dates
df_result = pd.read_sql(sql_query, connection, params=(pid_list, (Date1, Date2)))
I think the params passed to pd.read_sql() may need to be a dict, but am not sure how to structure this or how to reference the different entries in the SQL query without iterating.