masking a python 2d array with mask values from a list

Question

I have a 2d array. I need to filter the array for rows with values at a particular index. The values are from a list.

Here's an example.

My data:

arr= [[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]]

Let's say I want to filter for rows where the last entry is from the list 0.5,0.55,0.6.

I tried making a mask as follows:

>>> mask= arr['f4'] in [0.5, 0.55, 0.6]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>> mask= arr['f4']==0.5 or arr['f4']==0.55 or arr['f4']==0.6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>>

As shown it doesn't work.

Desired output is:

>>> arr_mask
[[1.681, 1.365, 0.105, 0.109, 0.5], [1.681, 1.365, 0.105, 0.109, 0.55], [1.681, 1.365, 0.105, 0.109, 0.6]]

Your feedback is appreciated.

EDIT1: There was a question about 'f4'. That seems to come from the way I read the data from a file into the array.

>>> arr= np.genfromtxt('data.rpt',dtype=None)

>>> arr
array([ ('tag', 1.681, 1.365, 0.105, 0.109, 0.5),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.51),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.52),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.53),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.54),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.55),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.56),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.57),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.58),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.59),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.6)], 
        dtype=[('f0', 'S837'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

EDIT02:

Tried the proposal from jp_data_analysis but it does not work. Might be caused by the origin of the array from reading from file?

>>> arr_np = np.array(arr)
>>> search = np.array([0.50, 0.55, 0.60])
>>> arr_np[np.in1d(arr_np[:,-1], search)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array
>>>

[a for a in arr if a[-1] in [0.5, 0.55, 0.6]]

internet_user
– internet_user

2018-02-09 23:48:16 +00:00
Commented Feb 9, 2018 at 23:48 — internet_user
– internet_user, Commented Feb 9, 2018 at 23:48
What is f4???

whackamadoodle3000
– whackamadoodle3000

2018-02-09 23:53:58 +00:00
Commented Feb 9, 2018 at 23:53 — whackamadoodle3000
– whackamadoodle3000, Commented Feb 9, 2018 at 23:53

f5r5e5d · Accepted Answer · 2018-02-10 00:07:15Z

1

basically from the np.where docs

import numpy as np


arr= np.array([[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]])


ix = np.isin(arr[:,-1], [0.5,0.55,0.6])  

np.where(ix)
Out[107]: (array([ 0,  5, 10], dtype=int64),)

arr[np.where(ix),:]
Out[108]: 
array([[[ 1.681,  1.365,  0.105,  0.109,  0.5  ],
        [ 1.681,  1.365,  0.105,  0.109,  0.55 ],
        [ 1.681,  1.365,  0.105,  0.109,  0.6  ]]])

answered Feb 10, 2018 at 0:07

f5r5e5d

3,7413 gold badges17 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jpp · Accepted Answer · 2018-02-10 00:22:57Z

1

For a vectorised approach try numpy:

import numpy as np

arr= [[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]]

arr = np.array(arr)
search = np.array([0.50, 0.55, 0.60])

arr[np.in1d(arr[:,-1], search)]

# array([[ 1.681,  1.365,  0.105,  0.109,  0.5  ],
#        [ 1.681,  1.365,  0.105,  0.109,  0.55 ],
#        [ 1.681,  1.365,  0.105,  0.109,  0.6  ]])

I expect this to be more efficient for larger arrays.

edited Feb 10, 2018 at 0:22

answered Feb 10, 2018 at 0:08

jpp

166k37 gold badges301 silver badges362 bronze badges

2 Comments

Gert Gottschalk Over a year ago

I tried, but it does not work:

>>> arr_np = np.array(arr) >>> search = np.array([0.50, 0.55, 0.60]) >>> search = np.array([0.50, 0.55, 0.60]) >>> arr_np[np.in1d(arr_np[:,-1], search)] Traceback (most recent call last):   File "<stdin>", line 1, in <module> IndexError: too many indices for array >>>

jpp Over a year ago

@GertGottschalk, I've included my full code now. It works on my python 3.6 / numpy 1.11.

Matt. Stroh · Accepted Answer · 2018-02-10 00:11:34Z

0

The answers you've got are using numpy, but in case you are not able to use numpy, this could work too.

You can use list comprehension (like @interent_user said)

masked_data = [ x for x in arr if x[-1] in [0.5, 0.55, 0.6] ]

you can also use filter

masked_data = list(filter(lambda x: x[-1] in [0.5, 0.55, 0.6], arr)

answered Feb 10, 2018 at 0:11

Matt. Stroh

9248 silver badges17 bronze badges

Comments

foladev · Accepted Answer · 2018-02-10 00:17:36Z

0

arr= np.array([[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],    
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]])
mask=[.5,.6,.55]
arr_mask = np.array([x for x in arr if sum(np.isin(a,mask))])

answered Feb 10, 2018 at 0:17

foladev

3522 silver badges8 bronze badges

Collectives™ on Stack Overflow

masking a python 2d array with mask values from a list

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related