Would like help to accomplish the following:
Summary
- Re-index a data frame
- Merge multiple data frames
- Plot the new data frame based on a time column
More Details
I have a data set (raw_data) that looks like this:
id timestamp key value
1 1576086899000 temperature 70
2 1576086899000 sleep 8
3 1576086899000 heartrate 65
4 1576086876000 temperature 72
5 1576086876000 sleep 7.5
6 1576086876000 heartrate 62
7 1576086866000 temperature 74
8 1576086866000 sleep 7.8
9 1576086866000 heartrate 64
I pivoted it using the following:
df = rawdata.pivot(index='timestamp', columns='key', values='value')
This made the index a timestamp value, and each column a key name with it's corresponding value.
Because each of the rows does not always contain a value for each key/value pair, I created a new data frame for the specific key, and dropped any NaN values:
sleep_df = pd.DataFrame({'date': df.index, 'value': df.sleep}).dropna()
This still kept the index as column timestamp, but created a duplicate column called time. I then formatted the time column as a year-month-day value with:
sleep_df['date]' = pd.to_datetime(sleep_df['date'], unit='ms').map(lambda x: x.strftime('%Y-%m-%d'))
Therefore, my resulting data set looks like for each of these tables looks like the following:
timestamp date sleep
1576086899000 2020-04-05 8
1576086876000 2020-04-04 7.5
1576086866000 2020-04-03 7.8
My end goal would be to:
- Merge each of these tables and plot them against the time column. I believe the index should be kept
timestampfor this reason, because it can then merge values where the timestamp the data was recorded was the same. - In future analysis, I'd love to figure out if I could merge data based on the date rather than on the timestamp since some data may have not been recorded at the exact time. Would I have to make the index: "date" instead? I assume I'd have to make sure that there was only one entry for each individual date otherwise merging tables could get funky.
- I think I figured out the plotting for this data. I made the index of the table to be the
datefield, and converted all the values to be typeintand it plotted against the date just fine. Is there a better way to do this?
Thank you for the help in advance, SO has been so great as a learning tool.