0

I have a dataframe that looks like this (my input database on COVID cases)

data:

    date      state  cases
0   20200625  NY     300
1   20200625  CA     250
2   20200625  TX     200
3   20200625  FL     100
5   20200624  NY     290
6   20200624  CA     240
7   20200624  TX     100
8   20200624  FL     80
...

worth noting that the "date" column in the above data is a number (not datetime)

I want to make it a timeseries like this (desired output), with dates as index and each state's COVID cases as columns

          NY     CA     TX     FL
20200625  300    250    200    100
20200626  290    240    100    80
...

As of now I managed to create only the scheleton of the output with the following code

states = ['NY', 'CA', 'TX', 'FL']
days = [20200625, 20200626]

columns = states
positives = pd.DataFrame(columns = columns)

i = 0
for day in days:
   positives.loc[i, "date"] = day
   i = i +1

positives.set_index('date', inplace=True)
positives= positives.rename_axis(None)
print(positives)

which returns:

             NY   CA   TX   FL
20200625.0  NaN  NaN  NaN  NaN
20200626.0  NaN  NaN  NaN  NaN

how can I get from the "data" dataframe the value of column "cases" when:

(i) value in data["state"] = column header of "positives",

(ii) value in data["date"] = row index of "positives"

1
  • 1
    Use , df.pivot('date', 'state', 'cases') Commented Jun 27, 2020 at 4:58

2 Answers 2

4

You can do:

df = df.set_index(['date', 'state']).unstack().reset_index()

# fix column names
df.columns = df.columns.get_level_values(1)

state               CA     FL     NY     TX
0      20200624  240.0    NaN  290.0    NaN
1      20200625  250.0  100.0  300.0  200.0

Later, to set index again we need to set the name explicitly, do:

df = df.set_index("")
df.index.name = "date"
Sign up to request clarification or add additional context in comments.

4 Comments

awesome!! it works perfectly. Only remaining issue is that i get some labels above the column headers: imgur.com/onv6hct (in the image: date, positive [positive was the name of column with the values that now fill the df.. and state above the index! ) Any suggestion on how to clean it ?
thank you... how can I make the column with the date the new index now? I would have used df.set_index('date', inplace=True) but there is not title for that column anymore
I don't now if I'm doing something wrong... but this is what I get imgur.com/IIYEWAs ... I'm looking to have 1 single headers row with name of states (NY...), without "state" at the beginning and on the same row as "date". Sorry for the multiple questions, I really really really appreciate your help.
try set df.columns.name = None
2

The transformation you are interested in is called a pivot. You can achieve this in Pandas as follows:

# Reproduce part of the data
data = pd.DataFrame({'date': [20200625, 20200625, 20200624, 20200624], 
                     'state': ['NY', 'CA', 'NY', 'CA'], 
                     'cases': [300, 250, 290, 240]})
data

#        date state  cases
# 0  20200625    NY    300
# 1  20200625    CA    250
# 2  20200624    NY    290
# 3  20200624    CA    240

# Pivot
data.pivot(index='date', columns='state', values='cases')

# state      CA   NY
# date              
# 20200624  240  290
# 20200625  250  300

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.