0

I am using pyodbc to perform queries to a large SQL Server database. Because SQL can not hold native numpy arrays, data is stored in regular strings representative of numpy arrays. When I load this data to a DataFrame, it is as follows:

Id       Array1                             Array2
1        -21.722315 11.017685 -23.340452    2754.642 481.94247 21.728323
...
149001   1.611342 1.526262 -35.415166       6252.124 61.51516 852.15167

However, I then want to perform operations on Array1 and Array2, so I need to convert them to actual numpy arrays. My current way of doing this is applying np.fromstring to the entire dataset column.

df['Array1'] = df['Array1'].apply(lambda x: np.fromstring(x, dtype=np.float32, sep = ' '))
df['Array2'] = df['Array2'].apply(lambda x: np.fromstring(x, dtype=np.float32, sep = ' '))
# Elapsed Time: 9.524s

Result:

Id       Array1                               Array2
1        [-21.722315, 11.017685, -23.340452]  [2754.642, 481.94247, 21.728323]
...
149001   [1.611342, 1.526262, -35.415166]     [6252.124, 61.51516, 852.15167]

While this code works, I don't believe it is efficient nor scalable. Are there more computationally efficient ways of transforming a large amount of data in numpy arrays?

2
  • have you checked if this can be done on the server side during the call to the database? I believe PostgreSql handles arrays for instance. just in case... Commented Sep 25, 2022 at 19:16
  • You could store the data as a byte array in a varbinary column. Commented Sep 25, 2022 at 20:50

1 Answer 1

1

Using tobytes() to get the underlying bytestring and frombuffer() to convert back will be much more efficient. One detail is that the array dimensions are lost after tobytes(), so you may need to resize the after frombuffer().

Sign up to request clarification or add additional context in comments.

1 Comment

Just had to specify dtype=np.float32 and all worked perfectly. 16x performance increase with this method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.