IndexError: too many indices for array for numpy in Python

Question

I am trying to calculate a mean from values imported from a text file. After carrying out this syntax:

vragenlijst_data= np.genfromtxt('antwoorden.txt', delimiter=',', dtype=None, names=('geslacht', 'leeftijd', 'stelling1', 'doorvraag1', 'stelling2', 'stelling3', 'doorvraag3', 'opmerking'))

I get the following data:

[("['vrouw'", 43, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 
'onbeantwoord'", " '']")
 ("['vrouw'", 34, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 43, " '3'", " 'sport'", " '2'", " '2'", " 'onbeantwoord'", " '']")
 ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")]
<type 'numpy.ndarray'>

Now I want to calculate the mean of the variable age, but I get the following error and I don't succeed in fixing the error:

IndexErrorTraceback (most recent call last)
(path to file) in <module>()
10 print (vragenlijst_data)
11
---> 12 mean = np.mean(vragenlijst_data[0:,1])
13
IndexError: too many indices for array

Does anyone have a solution to this problem? That would be a great help!

I tried to calculate the mean with: mean = np.mean(vragenlijst_data[0:,1]) — ItK
– ItK, Commented Jun 5, 2018 at 9:49

jpp · Accepted Answer · 2018-06-05 10:42:25Z

1

You are reading in your data as an array of tuples of strings. This is inefficient. I suggest you use a purpose-built library for mixed types, e.g. pandas.

However, you can use either a list comprehension or map with your current set-up:

A = np.array([("['vrouw'", 43, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 34, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 43, " '3'", " 'sport'", " '2'", " '2'", " 'onbeantwoord'", " '']"),
              ("['vrouw'", 32, " '2'", " 'onbeantwoord'", " '2'", " '2'", " 'onbeantwoord'", " '']")])

from operator import itemgetter

# list comprehension    
res = np.mean([int(i[1]) for i in A])  # 36.0

# functional approach
res = np.mean(list(map(int, map(itemgetter(1), A))))  # 36.0

edited Jun 5, 2018 at 10:42

answered Jun 5, 2018 at 9:58

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ItK Over a year ago

Thanks very much! Is there any way to do it with 'np.mean' from the numpy library?

jpp Over a year ago

@ltK, my solution uses np.mean. If you mean in a vectorised way, then given the way you've read the data, no.. You don't have any contiguous memory array to work with.

ItK Over a year ago

Thanks! One more question: how would I read the data in order to have a contiguous memory array?

Collectives™ on Stack Overflow

IndexError: too many indices for array for numpy in Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related