0

I know Python is almost made for these kind of purposes, but I am really struggling to understand how I get access to specific values in the dataset, and I tried both with pandas and csv modules. It is probably a matter of syntax. Here's the thing: I have a csv file in the form of

Nation, Year, No. of refugees
Afghanistan,2013,6657
Albania,2013,199
Algeria,2013,91
Angola,2013,47
Armenia,2013,156
...
...
Afghanistan,2012,6960
Albania,2012,157
Algeria,2012,67
Angola,2012,43
Armenia,2012,143
...

and so on. What I would like to do is to get the total amount of refugees per year, i.e. selecting all the rows with a certain year and summing all the elements in the related "no. of refugees" column. I managed to do this:

import csv

with open('refugees.csv', 'r') as f:
    d_reader = csv.DictReader(f)
    headers = d_reader.fieldnames
    print headers

    #2013
    list2013=[]
    for line in d_reader:
        if (line['Year']=='2013'):
            list2013.append(line['Refugees'])
    list2013=map(int,list2013) #I have str values in my file
    ref13=sum(list2013)

but I am looking for a more elegant (and, above all, iterative) solution. Moreover, if I perform that procedure multiple times for different years, I always get 0: it works for 2013 only, not sure why.

Edit: I tried this as well, without success, but I think this could be totally wrong:

import csv
refugees_dict={}
a=range(2005,2014)
a=map(str, a)
with open('refugees.csv', 'r') as f:
    d_reader = csv.DictReader(f)

    for element in a:
        for line in d_reader:
            if (line['Year']==element):
                print 'hello!'
                temp_list=[]
                temp_list.append(line['Refugees'])
                temp_list=map(int, temp_list)
                refugees_dict[a]=sum(temp_list)

print refugees_dict

The next step of my work will involve further studies on the dataset, eg I am probably gonna need to access data nation-wise instead of year-wise, and I really appreciate any hint so I understand how to manipulate data. Thanks a lot.

3
  • 2
    it works for 2013 only, not sure why, because you hardcoded if (line['Year']=='2013'), maybe?....., also were you not able to load this into a pandas data frame? Commented Jun 5, 2017 at 18:13
  • Instead of csv, since you have pandas listed in your tags, use 'import pandas as pd' and then dataframe = pd.read_csv("refugees.csv") or dataframe = pd.read_csv("refugees.csv", header=None) if you don't want the headers. Commented Jun 5, 2017 at 18:17
  • Dmitry, of course I edited the code properly each time. I am a noob, but not completely stupid :D Commented Jun 5, 2017 at 18:32

3 Answers 3

6

Since you tagged pandas in the question, here's a pandas solution to getting the number of refugees per year.

Let's say my input csv looks like this (note that I've eliminated the extra space before the column names):

Nation,Year,No. of refugees
Afghanistan,2013,6657
Albania,2013,199
Algeria,2013,91
Angola,2013,47
Armenia,2013,156
Afghanistan,2012,6960
Albania,2012,157
Algeria,2012,67
Angola,2012,43
Armenia,2012,143

You can read that into a pandas DataFrame like this:

df = pd.read_csv('data.csv')

You can then get the total like this:

df.groupby(['Year']).sum()

This gives:

        No. of refugees
 Year
2012               7370
2013               7150
Sign up to request clarification or add additional context in comments.

1 Comment

Seems like the simplest solution! I am still familiarizing with python and pandas, seen your solution I can tell that my question was quite a noob one, thanks a lot.
1

Consider:

from collections import defaultdict
by_year = defaultdict(int)  # a dict that has a 0 under every key.

and then

by_year[line['year']] += int(line['Refugees'])

Now you can just look at by_year['2013'] and see your sum (same for other years).

2 Comments

thanks a lot, seems useful for a nation-wise analysis as well.
you're welcome. btw with this kind of data, you might be better served by a relational database and SQL. any major RDBMS has no trouble importing CSVs.
1

To sum by year you can try this:

f = open('file.csv').readlines()

f = [i.strip('\n').split(',') for i in f]

years = {i[1]:0 for i in f}

for i in f:
    years[i[1]] += int(i[-1])

Now, you have a dictionary that has the sum of all the refugees by year.

To access nation-wise:

nations = {i[0]:0 for i in f}

for i in f:
    nations[i[0]] += int(i[-1])

1 Comment

CSV parsing is hairy. I'd use the csv module in general case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.