I have a dataframe that looks like this (my input database on COVID cases)
data:
date state cases
0 20200625 NY 300
1 20200625 CA 250
2 20200625 TX 200
3 20200625 FL 100
5 20200624 NY 290
6 20200624 CA 240
7 20200624 TX 100
8 20200624 FL 80
...
worth noting that the "date" column in the above data is a number (not datetime)
I want to make it a timeseries like this (desired output), with dates as index and each state's COVID cases as columns
NY CA TX FL
20200625 300 250 200 100
20200626 290 240 100 80
...
As of now I managed to create only the scheleton of the output with the following code
states = ['NY', 'CA', 'TX', 'FL']
days = [20200625, 20200626]
columns = states
positives = pd.DataFrame(columns = columns)
i = 0
for day in days:
positives.loc[i, "date"] = day
i = i +1
positives.set_index('date', inplace=True)
positives= positives.rename_axis(None)
print(positives)
which returns:
NY CA TX FL
20200625.0 NaN NaN NaN NaN
20200626.0 NaN NaN NaN NaN
how can I get from the "data" dataframe the value of column "cases" when:
(i) value in data["state"] = column header of "positives",
(ii) value in data["date"] = row index of "positives"
df.pivot('date', 'state', 'cases')