0

I have a dataframe of Covid-19 deaths by country. Countries are identified in the Country column. Sub-national classification is based on the Province column.

I want to generate a dataframe which sums all columns based on the value in the Country column (except the first 2, which are geographical data). In short, for each date, I want to compress the observations for all provinces of a country such that I get a single number for each country.

Right now, I am able to do that for a single date:

import pandas as pd

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID- 
19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
raw = pd.read_csv(url)
del raw['Lat']
del raw['Long']
raw.rename({'Country/Region': 'Country', 'Province/State': 'Province'}, axis=1, inplace=True)

raw2 = raw.groupby('Country')['6/29/20'].sum()

How can I achieve this for all dates?

1
  • If it's a list, you can use raw.Country.unique() Commented Jul 1, 2020 at 4:12

1 Answer 1

1

You can use iloc:

raw2 = raw.iloc[:,4:].groupby(raw.Country).sum()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.