13

I am creating a python script using pandas to read through a file which has multiple row values.

Once read, I need to build an array of these values and then assign it to a dataframe row value.

The code I have used is

import re
import numpy as np
import pandas as pd
master_data = pd.DataFrame()
temp_df = pd.DataFrame()
new_df = pd.DataFrame()

for f in data:


##Reading the file in pandas which is in excel format 
#
file_df = pd.read_excel(f)


filename = file_df['Unnamed: 1'][2]


##Skipping first 24 rows to get the required reading values
column_names = ['start_time','xxx_value']
data_df = pd.read_excel(f, names=column_names, skiprows=25)


array =np.array([])

   for i in data_df.iterrows():
       array = np.append(array,i[1][1])


   temp_df['xxx_value'] = [array]
   temp_df['Filename'] = filename
   temp_df['sub_id']=       
   temp_df['Filename'].str.split('_',1).str[1].str.strip() 
   temp_df['sen_site']=    
   temp_df['Filename'].str.split('_',1).str[0].str.strip()
   temp_df['sampling_interval'] = 15
   temp_df['start_time'] = data_df['start_time'][2]


   new_df= new_df.append(xxx_df)

   new_df.index = new_df.index + 1
   new_df=new_df.sort_index()
   new_df.index.name='record_id'

  new_df = new_df.drop("Filename",1)    ##dropping the Filename as it          
  is not needed to be loaded in postgresql

 ##Rearrange to postgresql format
 column_new_df = new_df.columns.tolist()
 column_new_df.
 insert(4,column_new_df.pop(column_new_df.index('xxx_value')))
 new_df = new_df.reindex(columns = column_new_df)

 print(new_df)

This code is not working when I try to insert the array data into Postgresql.

It gives me an error stating:

ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray'

1
  • 6
    Hi, I had the same problem and came across your question while looking for solutions. Thus, I think it would be valuable to future others to solve it. Could you please fix the code sample that you provided (indentation is wrong after for f in data:)? Also, which line is throwing the error? Commented Nov 16, 2016 at 10:09

5 Answers 5

4

In my opinion, the most effective way would be to make psycopg2 always aware of np.ndarray(s). One could do that by registering an adapter:

import numpy as np
from psycopg2.extensions import register_adapter, AsIs

def addapt_numpy_array(numpy_array):
    return AsIs(tuple(numpy_array))

register_adapter(np.ndarray, addapt_numpy_array)

To help working with numpy in general, my default addon to scripts/libraries dependent on psycopg2 is:

import numpy as np
from psycopg2.extensions import register_adapter, AsIs

def addapt_numpy_float64(numpy_float64):
    return AsIs(numpy_float64)

def addapt_numpy_int64(numpy_int64):
    return AsIs(numpy_int64)

def addapt_numpy_float32(numpy_float32):
    return AsIs(numpy_float32)

def addapt_numpy_int32(numpy_int32):
    return AsIs(numpy_int32)

def addapt_numpy_array(numpy_array):
    return AsIs(tuple(numpy_array))

register_adapter(np.float64, addapt_numpy_float64)
register_adapter(np.int64, addapt_numpy_int64)
register_adapter(np.float32, addapt_numpy_float32)
register_adapter(np.int32, addapt_numpy_int32)
register_adapter(np.ndarray, addapt_numpy_array)

otherwise there would be some issues even with numerical types.

I got the adapter trick from this other stackoverflow entry.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm trying your solution, but getting error... SyntaxError: syntax error at or near "[" Any idea to solve it?
This error seems to be related to some wrongly closed bracket or parenthesis before whenever the error happens. As far as I can tell, the code block here properly closes all open parenthesis, so the error should come from some other place in the code? I would need to see more context to actually pinpoint where the problem comes from...
I had to do it like this: ``` def addapt_numpy_array(numpy_array): if len(numpy_array) == 0: return AsIs("null") return AsIs("ARRAY" + np.array2string(numpy_array, separator=",")) ```
3

I am not sure where the problem is, as I can't see in your code the part where you insert the data into Postgres.

My guess though is that you are giving Postgres a Numpy array: psycopg2 can't handle Numpy data types, but it should be fairly easy to convert it to native Python types that work with psycopg2 (e.g. by using the .tolist(method), it is difficult to give more precise information without the code).

Comments

1

Convert each numpy array element to its equivalent list using apply and tolist first, and then you should be able to write the data to Postgres:

df['column_name'] = df['column_name'].apply(lambda x: x.tolist())

1 Comment

Attention, this was not enough for me, the to_sql function still inserted it as text. I needed to explicitly state the types via to_sql('tablename', engine, dtype={'column_name': postgresql.ARRAY(sqlalchemy.types.TEXT)}).
1
from pgvector.psycopg2 import register_vector
db_connection = psycopg2.connect(dsn)

register_vector(db_connection)

You need to import register_vector so that it can work with numpy arrays. This is also needed if you're using PG Vector extension for PostgreSQL.

Comments

0

We can address the issue by extracting one element at a time. Here I'm assuming for a dataframe temp_df, sub_id of type numpy.int64, we can directly extract the values using the iloc and item as temp_df.iloc[0]['sub_id'].item() and we can push that in DB.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.