2

I am trying to calculate a mean from values imported from a text file. After carrying out this syntax:

vragenlijst_data= np.genfromtxt('antwoorden.txt', delimiter=',', dtype=None, names=('geslacht', 'leeftijd', 'stelling1', 'doorvraag1', 'stelling2', 'stelling3', 'doorvraag3', 'opmerking'))

I get the following data:

[("['vrouw'", 43, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 
'onbeantwoord'", " '']")
 ("['vrouw'", 34, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 43, " '3'", " 'sport'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")]
<type 'numpy.ndarray'>

Now I want to calculate the mean of the variable age, but I get the following error and I don't succeed in fixing the error:

IndexErrorTraceback (most recent call last)
(path to file) in <module>()
10 print (vragenlijst_data)
11
---> 12 mean = np.mean(vragenlijst_data[0:,1])
13
IndexError: too many indices for array 

Does anyone have a solution to this problem? That would be a great help!

1
  • I tried to calculate the mean with: mean = np.mean(vragenlijst_data[0:,1]) Commented Jun 5, 2018 at 9:49

1 Answer 1

1

You are reading in your data as an array of tuples of strings. This is inefficient. I suggest you use a purpose-built library for mixed types, e.g. pandas.

However, you can use either a list comprehension or map with your current set-up:

A = np.array([("['vrouw'", 43, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 34, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 43, " '3'", " 'sport'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")])

from operator import itemgetter

# list comprehension    
res = np.mean([int(i[1]) for i in A])  # 36.0

# functional approach
res = np.mean(list(map(int, map(itemgetter(1), A))))  # 36.0
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks very much! Is there any way to do it with 'np.mean' from the numpy library?
@ltK, my solution uses np.mean. If you mean in a vectorised way, then given the way you've read the data, no.. You don't have any contiguous memory array to work with.
Thanks! One more question: how would I read the data in order to have a contiguous memory array?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.