consider following input data
| prod | col1 | col2 |
|---|---|---|
| One | hi | hello |
| One | 18.0 | 19.52 |
| One | 2024-02-12 00:00:00 | 2024-03-07 00:00:00 |
| two | 2024-02-12 00:00:00 | 2024-02-11 00:00:00 |
| two | in-transit | in-stock |
want to find difference between col1 and col2, since there is difference in datatype in each row, I am facing difficulty to apply pandas functions. using SQL knowledge tried this code but didn't work
logic:
- if str then difference = "not same"
- if datetime then difference = (col2-col1).days
- else difference = col2 - col1
df["difference"] = np.where( df['col2'].apply(lambda x: isinstance(x, str)), "not same",
df["col2"].apply(lambda x: isinstance(x, datetime)), (df['col2'] - df['col1']).dt.days,
df['old_value'] - df['new_value'])
** Not getting expected output, datetime is still in timedelta
Expected output:
| prod | col1 | col2 | difference |
|---|---|---|---|
| One | hi | hello | not same |
| One | 18.0 | 19.52 | 1.52 |
| One | 2024-02-12 00:00:00 | 2024-03-07 00:00:00 | 25 |
| two | 2024-02-12 00:00:00 | 2024-02-11 00:00:00 | 1 |
| two | in-transit | in-stock | not same |
Any other approach please suggest
df.to_dict('list'))