3

I am trying to filter my ndarray by another array I have collected (with the same values)

My main ndarray looks like

[['Name' 'Col1' 'Count']
 ['test' '' '413']
 ['erd' ' ' '60']
 ..., 
 ['Td1' 'f' '904']
 ['Td2' 'K' '953']
 ['Td3' 'r' '111']]

I have another list with various matching names

names = ['Td1','test','erd']

What I'd Like to Do

I'd like to use the list names as a filter against the ndarray above?

What I've Tried

name_filter = main_ndarray[:,0] == names

This does not work

What I'd Expect

[['Name' 'Col1' 'Count']
 ['test' '' '413']
 ['erd' ' ' '60']
 ['Td1' 'f' '904']]
0

3 Answers 3

3

Consider using Pandas for this kind of data:

import pandas as pd

data = [['Name', 'Col1', 'Count'],
        ['test', '', '413'],
        ['erd', ' ', '60'],
        ['Td1', 'f', '904'],
        ['Td2', 'K', '953'],
        ['Td3', 'r', '111']]

df = pd.DataFrame(data[1:], columns=data[0])
names = ['Td1','test','erd']
result = df[df.Name.isin(names)]

Results:

>>> df
   Name Col1 Count
0  test        413
1   erd         60
2   Td1    f   904
3   Td2    K   953
4   Td3    r   111
>>> result
   Name Col1 Count
0  test        413
1   erd         60
2   Td1    f   904
>>>

References

Sign up to request clarification or add additional context in comments.

Comments

1

You can use the filter function too.

cats_array = numpy.array(
 [['Name' ,'Col1', 'Count'],
 ['test', '' ,'413'],
 ['erd' ,' ' ,'60'],
 ['Td1' ,'f' ,'904'],
 ['Td2' ,'K' ,'953'],
 ['Td3' ,'r', '111']]
 )

 names = ['Td1','test','erd']

 filter(lambda x: x[0] in names, cats_array)

gives:

[array(['test', '', '413'],
       dtype='|S5'), array(['erd', ' ', '60'],
       dtype='|S5'), array(['Td1', 'f', '904'],
       dtype='|S5')]

2 Comments

So out of curiosity what would I do with that array now? it didn't retain the "regular" formatting. It now has this dtype in there and the values in index 1 are completely separate from a contiguous array
@cat You can hit it with map(lambda a: list(a), filter(lambda x: x[0] in names, cats_array)) to keep a list-like formatting. If you do that, your answer will be [['test', '', '413'], ['erd', ' ', '60'], ['Td1', 'f', '904']]
1

I would also go with @YXD's Pandas solution but just for the sake of completeness I also provide a simple solution based on list comprehension:

data = [['Name', 'Col1', 'Count'],
 ['test', '', '413'],
 ['erd', ' ', '60'],
 ['Td1', 'f', '904'],
 ['Td2', 'K', '953'],
 ['Td3', 'r', '111']]

names = ['Td1', 'test', 'erd']

# select all sublist of data
res = [l for l in data if l[0] in names]

# insert the first row of data
res.insert(0, data[0]) 

which then gives you the desired output:

[['Name', 'Col1', 'Count'],
 ['test', '', '413'],
 ['erd', ' ', '60'],
 ['Td1', 'f', '904']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.