2

Warning: Python newbie...

I have text that looks like this, which came from a database query:

2000;"SCHOOLS OF MEDICINE";416765.0
2000;"SCHOOLS OF ARTS AND SCIENCES";36000.0
2000;"SCHOOLS OF MEDICINE";2000.0
2000;"SCHOOLS OF MEDICINE";179728.0
2000;"OTHER DOMESTIC HIGHER EDUCATION";244547.0
2000;"SCHOOLS OF MEDICINE";107325.0
2000;"OTHER DOMESTIC HIGHER EDUCATION";61609.0
2000;"SCHOOLS OF MEDICINE";93600.0
2000;"SCHOOLS OF EARTH SCIENCES/NATURAL RESOURCES";64865.0
2000;"SCHOOLS OF MEDICINE";50000.0
...

I'd like to make a chart that shows the average award amount for all years with error bars for each Division.

However, I'm not sure how to get this data into a scipy array to make the chart. I've tried the following:

data = asarray(2000;"SCHOOLS OF MEDICINE";416765.0
    2000;"SCHOOLS OF ARTS AND SCIENCES";36000.0
    2000;"SCHOOLS OF MEDICINE";2000.0
    2000;"SCHOOLS OF MEDICINE";179728.0
    2000;"OTHER DOMESTIC HIGHER EDUCATION";244547.0
    2000;"SCHOOLS OF MEDICINE";107325.0
    2000;"OTHER DOMESTIC HIGHER EDUCATION";61609.0
    2000;"SCHOOLS OF MEDICINE";93600.0
    2000;"SCHOOLS OF EARTH SCIENCES/NATURAL RESOURCES";64865.0
    2000;"SCHOOLS OF MEDICINE";50000.0)

I also tried with data = sp.array(). Both give the following error:

    data = sp.asarray(2000;"SCHOOLS OF MEDICINE";416765.0
                          ^
SyntaxError: invalid syntax

So, it seems to me that the array() and asarray() methods don't like semi-colon delimited data.

Any suggestions on how to do this would be great. If possible, I'd prefer not to save the data to a file first.

Thanks!

1 Answer 1

3

Consider using pandas for data like this:

import pandas as pd
from StringIO import StringIO
import matplotlib.pyplot as plt

input = """2000;"SCHOOLS OF MEDICINE";416765.0
2000;"SCHOOLS OF ARTS AND SCIENCES";36000.0
2000;"SCHOOLS OF MEDICINE";2000.0
2000;"SCHOOLS OF MEDICINE";179728.0
2001;"SCHOOLS OF MEDICINE";1234.0
2001;"SCHOOLS OF ARTS AND SCIENCES";100.0
2002;"SCHOOLS OF MEDICINE";9999.0
2002;"SCHOOLS OF MEDICINE";8436.0"""

df = pd.read_csv(StringIO(input), sep=';', header=None, names=['year', 'division', 'award'])
print df
yeartotals = df.groupby(['year'])[['award']].sum()
print yeartotals
yeartotals.plot()
plt.show()

I'm not exactly sure what you want to plot, but pandas integrates quite nicely with matplotlib for plotting.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.