2

I have a 2d array. I need to filter the array for rows with values at a particular index. The values are from a list.

Here's an example.

My data:

arr= [[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]] 

Let's say I want to filter for rows where the last entry is from the list 0.5,0.55,0.6.

I tried making a mask as follows:

>>> mask= arr['f4'] in [0.5, 0.55, 0.6]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>> mask= arr['f4']==0.5 or arr['f4']==0.55 or arr['f4']==0.6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>> 

As shown it doesn't work.

Desired output is:

>>> arr_mask
[[1.681, 1.365, 0.105, 0.109, 0.5], [1.681, 1.365, 0.105, 0.109, 0.55], [1.681, 1.365, 0.105, 0.109, 0.6]]

Your feedback is appreciated.

EDIT1: There was a question about 'f4'. That seems to come from the way I read the data from a file into the array.

>>> arr= np.genfromtxt('data.rpt',dtype=None)

>>> arr
array([ ('tag', 1.681, 1.365, 0.105, 0.109, 0.5),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.51),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.52),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.53),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.54),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.55),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.56),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.57),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.58),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.59),
        ('tag', 1.681, 1.365, 0.105, 0.109, 0.6)], 
        dtype=[('f0', 'S837'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

EDIT02:

Tried the proposal from jp_data_analysis but it does not work. Might be caused by the origin of the array from reading from file?

>>> arr_np = np.array(arr)
>>> search = np.array([0.50, 0.55, 0.60])
>>> arr_np[np.in1d(arr_np[:,-1], search)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array
>>> 
2
  • 3
    [a for a in arr if a[-1] in [0.5, 0.55, 0.6]] Commented Feb 9, 2018 at 23:48
  • 1
    What is f4??? Commented Feb 9, 2018 at 23:53

4 Answers 4

1

basically from the np.where docs

import numpy as np


arr= np.array([[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]])


ix = np.isin(arr[:,-1], [0.5,0.55,0.6])  

np.where(ix)
Out[107]: (array([ 0,  5, 10], dtype=int64),)

arr[np.where(ix),:]
Out[108]: 
array([[[ 1.681,  1.365,  0.105,  0.109,  0.5  ],
        [ 1.681,  1.365,  0.105,  0.109,  0.55 ],
        [ 1.681,  1.365,  0.105,  0.109,  0.6  ]]])
Sign up to request clarification or add additional context in comments.

Comments

1

For a vectorised approach try numpy:

import numpy as np

arr= [[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]]

arr = np.array(arr)
search = np.array([0.50, 0.55, 0.60])

arr[np.in1d(arr[:,-1], search)]

# array([[ 1.681,  1.365,  0.105,  0.109,  0.5  ],
#        [ 1.681,  1.365,  0.105,  0.109,  0.55 ],
#        [ 1.681,  1.365,  0.105,  0.109,  0.6  ]])

I expect this to be more efficient for larger arrays.

2 Comments

I tried, but it does not work: >>> arr_np = np.array(arr) >>> search = np.array([0.50, 0.55, 0.60]) >>> search = np.array([0.50, 0.55, 0.60]) >>> arr_np[np.in1d(arr_np[:,-1], search)] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: too many indices for array >>>
@GertGottschalk, I've included my full code now. It works on my python 3.6 / numpy 1.11.
0

The answers you've got are using numpy, but in case you are not able to use numpy, this could work too.

You can use list comprehension (like @interent_user said)

masked_data = [ x for x in arr if x[-1] in [0.5, 0.55, 0.6] ]

you can also use filter

masked_data = list(filter(lambda x: x[-1] in [0.5, 0.55, 0.6], arr)

Comments

0
arr= np.array([[ 1.681, 1.365, 0.105, 0.109, 0.50],
      [ 1.681, 1.365, 0.105, 0.109, 0.51],
      [ 1.681, 1.365, 0.105, 0.109, 0.52],
      [ 1.681, 1.365, 0.105, 0.109, 0.53],
      [ 1.681, 1.365, 0.105, 0.109, 0.54],
      [ 1.681, 1.365, 0.105, 0.109, 0.55],
      [ 1.681, 1.365, 0.105, 0.109, 0.56],
      [ 1.681, 1.365, 0.105, 0.109, 0.57],    
      [ 1.681, 1.365, 0.105, 0.109, 0.58],
      [ 1.681, 1.365, 0.105, 0.109, 0.59],
      [ 1.681, 1.365, 0.105, 0.109, 0.60]])
mask=[.5,.6,.55]
arr_mask = np.array([x for x in arr if sum(np.isin(a,mask))])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.