I have a column of datetimes and I want to get the difference between values in terms of years, months, etc, instead of timedeltas that only provide days. How do I do this in Pandas?
Pandas provides DateOffset for relative deltas, but the docs say "the positional argument form of relativedelta is not supported", and that's the form that calculates a relative delta (as opposed to specifying a relative delta).
For this example, I'm only dealing with the min and max of the column to get the span, but I eventually want to apply this to the whole column.
min_max = df_most_watched['time'].agg(['min', 'max'])
min 2019-06-18 18:22:05.991000+00:00
max 2021-02-15 18:03:02.893000+00:00
Name: time, dtype: datetime64[ns, UTC]
min_max.diff():
min NaT
max 607 days 23:40:56.902000
Name: time, dtype: timedelta64[ns]
The output should be 1 year, 7 months, 27 days, 23:40:56.902000.
Attempted
Just to confirm, I tried pd.DateOffset(low, high) and got TypeError: `n` argument must be an integer, got <class 'pandas._libs.tslibs.timestamps.Timestamp'>
I tried .pct_change() on a whim hoping it would have a special case for datetimes, but no dice. TypeError: cannot perform __truediv__ with this index type: DatetimeArray
I checked if .diff() had some sort of setting like relative=True, but no.
Research
In the User Guide, the Time series page doesn't have anything relevant when I Ctrl+F for "relative" and the Time deltas page doesn't mention "relative" at all.
I checked if DateOffset might have any alternate constructors that could take two timestamps, but the docs don't mention any methods starting with from or anything else.
Setup
min_max = pd.Series(
{'min': pd.Timestamp('2019-06-18 18:22:05.991', tz='UTC'),
'max': pd.Timestamp('2021-02-15 18:03:02.893', tz='UTC')},
name='time')