0

How can one read/write pandas DataFrames (Numpy arrays) of strings in Cython?

It works just fine when I work with integers or floats:

# Cython file numpy_.pyx
@boundscheck(False)
@wraparound(False)
cpdef fill(np.int64_t[:,::1] arr):
    arr[0,0] = 10
    arr[0,1] = 11
    arr[1,0] = 13
    arr[1,1] = 14
# Python code
import numpy as np
from numpy_ import fill
a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
print(a)
fill(a)
print(a)

gives

>>> a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
>>> print(a)
[[0 1 2]
 [3 4 5]]
>>> fill(a)
>>> print(a)
[[10 11  2]
 [13 14  5]]

Also, the following code

# Python code
import numpy as np, pandas as pd
from numpy_ import fill
a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
df = pd.DataFrame(a)
print(df)
fill(df.values)
print(df)

gives

>>> a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
>>> df = pd.DataFrame(a)
>>> print(df)
   0  1  2
0  0  1  2
1  3  4  5
>>> fill(df.values)
>>> print(df)
    0   1  2
0  10  11  2
1  13  14  5

However, I am having hard time figuring out how to do the same thing when the input is an array of strings. For example, how can I read of modify a Numpy array or a pandas DataFrame:

a2 = np.array([['000','111','222'],['333','444','555']], dtype='U3')
df2 = pd.DataFrame(a2)

and, let us say, the goal is to change through Cython

'000' -> 'AAA'; '111' -> 'BBB'; '222' -> 'CCC'; '333' -> 'DDD'

I did read the following NumPy documentation page and the following Cython documentation page, but still can not figure out what to do.

Thank you very much for your help!

7
  • pandas does not use the numpy string dtypes. It makes those series object dtype. Look at df2.dtypes. Commented Aug 5, 2019 at 17:36
  • @hpaulj So, the declaration of a corresponding function should be cpdef fill_str(np.object_t[:,::1] arr)? Why does type(df2.at[0,0]) then give <class 'str'> (i.e. not 'object')? Commented Aug 5, 2019 at 17:42
  • str is an object. A dataframe designed to hold object can hold any subclass of object including str Commented Aug 5, 2019 at 17:53
  • @DavidW Thank you! If you know what I should read to understand what I need to do to solve my problem, please, let me know! Commented Aug 5, 2019 at 18:03
  • 2
    Here's a couple of (maybe) useful links for Numpy arrays of strings stackoverflow.com/questions/42543485/… stackoverflow.com/questions/28774096/…. This doesn't necessarily help you with Pandas too much, except that you can force Pandas to have a fixed length string datatype by specifying it in dtype. It also doesn't help with Unicode. I don't really have much advice beyond what's in this comment... Commented Aug 5, 2019 at 19:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.