6

I have a numpy array of strings 'A' of length 100 and they are sentences of different sizes. It is string NOT numpy strings

>>> type(A[0])
<type 'str'>

I want to find the location of strings in A which contain certain pattern like 'zzz' in them.

I tried

np.core.defchararray.find(A, 'zzz')

gives error:

TypeError: string operation on non-string array

I assume I will need to change each of the 'str' in A to numpy string ?

Edit:

I want to find the index of 'zzz' appearance in A

2
  • What do you want to do when you find them? Split them? Get the index? Commented Jun 2, 2016 at 18:46
  • Why not just [s.find(pattern) for s in A] and then you will have the index of the first occurrence of that pattern in each string (-1 if the pattern is not found) Commented Jun 2, 2016 at 18:48

3 Answers 3

16

No need to be fancy with this, you can get the list of indicies with a list comprehension and the in operator:

>>> import numpy as np
>>> lst = ["aaa","aazzz","zzz"]
>>> n = np.array(lst)
>>> [i for i,item in enumerate(n) if "zzz" in item]
[1, 2]

Note that here the elements of the array are actually numpy strings, but the in operator will work for regular strings too, so it's moot.

Sign up to request clarification or add additional context in comments.

Comments

5

The issue here is the nature of your array of strings.

If I make the array like:

In [362]: x=np.array(['one','two','three'])

In [363]: x
Out[363]: 
array(['one', 'two', 'three'], 
      dtype='<U5')

In [364]: type(x[0])
Out[364]: numpy.str_

The elements are special kind of string, implicitly padded to 5 characters (the longest, 'np.char methods work on this kind of array

In [365]: np.char.find(x,'one')
Out[365]: array([ 0, -1, -1])

But if I make a object array that contains strings, it produces your error

In [366]: y=np.array(['one','two','three'],dtype=object)

In [367]: y
Out[367]: array(['one', 'two', 'three'], dtype=object)

In [368]: type(y[0])
Out[368]: str

In [369]: np.char.find(y,'one')
...
/usr/lib/python3/dist-packages/numpy/core/defchararray.py in find(a, sub, start, end)
...
TypeError: string operation on non-string array

And more often than not, an object array has to be treated as a list.

In [370]: y
Out[370]: array(['one', 'two', 'three'], dtype=object)

In [371]: [i.find('one') for i in y]
Out[371]: [0, -1, -1]

In [372]: np.array([i.find('one') for i in y])
Out[372]: array([ 0, -1, -1])

The np.char methods are convenient, but they aren't faster. They still have to iterate through the array applying regular string operations to each element.

Comments

2

you can try this one:

np.core.defchararray.find(A.astype(str), 'zzz')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.