Python Pandas to_sql, how to create a table with a primary key?

Question

I would like to create a MySQL table with Pandas' to_sql function which has a primary key (it is usually kind of good to have a primary key in a mysql table) as so:

group_export.to_sql(con = db, name = config.table_group_export, if_exists = 'replace', flavor = 'mysql', index = False)

but this creates a table without any primary key, (or even without any index).

The documentation mentions the parameter 'index_label' which combined with the 'index' parameter could be used to create an index but doesn't mention any option for primary keys.

Documentation

@unutbu I think the index=True just ensures the index is written to the table and that it is an index in sql, and not yet a primary key — joris
– joris, Commented Jun 16, 2015 at 13:02
Yes, index just uses the row number as an index which is not what I want. — patapouf_ai
– patapouf_ai, Commented Jun 16, 2015 at 13:18
For now, there is not yet support for specifying primary keys (it's on the feature wishlist). Possible workaround to first create the table, and then use the 'append' option in to_sql. To create the table, pd.io.sql.get_schema could be helpful to create the schema (that then can be adapted/executed to create the table) — joris
– joris, Commented Jun 16, 2015 at 13:24
Thanks @joris, you're right index=True makes an index, but not a primary key. — unutbu
– unutbu, Commented Jun 16, 2015 at 13:36
@joris, trying to add to existing table gives error: "NOT NULL constraint failed" for 'id INT PRIMARY KEY NOT NULL'. How to fill-in PRIMARY KEY? — Alex Martian
– Alex Martian, Commented Jun 19, 2016 at 7:30

tomp · Accepted Answer · 2016-12-19 11:32:00Z

83

Simply add the primary key after uploading the table with pandas.

group_export.to_sql(con=engine, name=example_table, if_exists='replace', 
                    flavor='mysql', index=False)

with engine.connect() as con:
    con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')

edited Dec 19, 2016 at 11:32

answered Nov 23, 2016 at 17:30

tomp

1,2301 gold badge9 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Yohan Obadia Over a year ago

SQLite does not support this.

Ibrahimsha Over a year ago

@tomp - Thank you so much and add FIRST to bring index to front if needed ALTER TABLE {table_name}` CHANGE index id int(4) NOT NULL auto_increment FIRST;`

binford Over a year ago

Since SQLAlchemy 2.0 you need to wrap the SQL string in sqlalchemy.text, otherwise you get an ObjectNotExecutableError. Example: con.execute(text('ALTER TABLE example_table ADD PRIMARY KEY (ID_column)')). See also the sqlalchemy docs.

krvkir · Accepted Answer · 2015-06-25 08:29:44Z

37

Disclaimer: this answer is more experimental then practical, but maybe worth mention.

I found that class pandas.io.sql.SQLTable has named argument key and if you assign it the name of the field then this field becomes the primary key:

Unfortunately you can't just transfer this argument from DataFrame.to_sql() function. To use it you should:

create pandas.io.SQLDatabase instance

engine = sa.create_engine('postgresql:///somedb')
pandas_sql = pd.io.sql.pandasSQL_builder(engine, schema=None, flavor=None)

define function analoguous to pandas.io.SQLDatabase.to_sql() but with additional *kwargs argument which is passed to pandas.io.SQLTable object created inside it (i've just copied original to_sql() method and added *kwargs):

def to_sql_k(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None, **kwargs):
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = pd.io.sql.SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype, **kwargs)
    table.create()
    table.insert(chunksize)

call this function with your SQLDatabase instance and the dataframe you want to save

to_sql_k(pandas_sql, df2save, 'tmp',
        index=True, index_label='id', keys='id', if_exists='replace')

And we get something like

CREATE TABLE public.tmp
(
  id bigint NOT NULL DEFAULT nextval('tmp_id_seq'::regclass),
...
)

in the database.

PS You can of course monkey-patch DataFrame, io.SQLDatabase and io.to_sql() functions to use this workaround with convenience.

answered Jun 25, 2015 at 8:29

krvkir

8219 silver badges12 bronze badges

4 Comments

patapouf_ai Over a year ago

Nice. Thank you. In the end however I found it simpler to just make the table before and append to it.

patapouf_ai Over a year ago

I was also hoping that the index_label option of to_sql would help.

danio Over a year ago

Great answer, unfortunately it doesn't work with MySQL if the key column is a text type because pandas doesn't seem to have a way to specify the key length. It gives error 1170, "BLOB/TEXT column used in key specification without a key length"

Elsa Over a year ago

@krvkir Hi, I would like to add primary key to in to_sql function, but when I wrote keys='id' in that function, it says TypeError: to_sql() got an unexpected keyword argument 'keys', do you know the reason why? I use python 3.6.6, postgres 10.3, SQLAlchemy 1.2.15 and pandas 0.23.4, really appreciate for any advice.

yellowdolphin · Accepted Answer · 2021-09-27 17:19:48Z

22

As of pandas 0.15, at least for some flavors, you can use argument dtype to define a primary key column. You can even activate AUTOINCREMENT this way. For sqlite3, this would look like so:

import sqlite3
import pandas as pd

df = pd.DataFrame({'MyID': [1, 2, 3], 'Data': [3, 2, 6]})
with sqlite3.connect('foo.db') as con:
    df.to_sql('df', con=con, dtype={'MyID': 'INTEGER PRIMARY KEY AUTOINCREMENT'})

answered Sep 27, 2021 at 17:19

yellowdolphin

4014 silver badges5 bronze badges

3 Comments

Dan Steingart Over a year ago

OP should promote this as the chosen answer as it works quite well.

davidavr Over a year ago

@DanSteingart OP asked about MySQL but the string value for dtype is only supported for SQLite. In general, the dtype value needs to be a SQLAlchemy type (such as Integer()) so I think it's just an accident that for SQLite the string is passed through and can be used to set a primary key. In other SQL backends, this generally won't work.

Nick ODell Over a year ago

This doesn't work for me - I get the error: ValueError: The type of col is not a SQLAlchemy type.

S.Doe_Dude · Accepted Answer · 2021-02-14 17:07:00Z

1

with engine.connect() as con:
    con.execute('ALTER TABLE for_import_ml ADD PRIMARY KEY ("ID");')

for_import_ml is a table name in the database.

Adding a slight variation to tomp's answer (I would comment but don't have enough reputation points).

I am using PGAdmin with Postgres (on Heroku) to check and it works.

answered Feb 14, 2021 at 17:07

S.Doe_Dude

1912 silver badges6 bronze badges

Comments

howMuchCheeseIsTooMuchCheese · Accepted Answer · 2016-02-14 21:04:08Z

0

automap_base from sqlalchemy.ext.automap (tableNamesDict is a dict with only the Pandas tables):

metadata = MetaData()
metadata.reflect(db.engine, only=tableNamesDict.values())
Base = automap_base(metadata=metadata)
Base.prepare()

Which would have worked perfectly, except for one problem, automap requires the tables to have a primary key. Ok, no problem, I'm sure Pandas to_sql has a way to indicate the primary key... nope. This is where it gets a little hacky:

for df in dfs.keys():
    cols = dfs[df].columns
    cols = [str(col) for col in cols if 'id' in col.lower()]
    schema = pd.io.sql.get_schema(dfs[df],df, con=db.engine, keys=cols)
    db.engine.execute('DROP TABLE ' + df + ';')
    db.engine.execute(schema)
    dfs[df].to_sql(df,con=db.engine, index=False, if_exists='append')

I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.

Now that you have the models, you can explicitly name and use them (i.e. User = Base.classes.user) with session.query or create a dict of all the classes with something like this:

alchemyClassDict = {}
for t in Base.classes.keys():
    alchemyClassDict[t] = Base.classes[t]

And query with:

res = db.session.query(alchemyClassDict['user']).first()

answered Feb 14, 2016 at 21:04

howMuchCheeseIsTooMuchCheese

1,7902 gold badges22 silver badges33 bronze badges

1 Comment

danio Over a year ago

pd.io.sql.get_schema is not in the public interface so not good to rely on it. Also the code will only work if the dataframe doesn't have an index. Otherwise have to use something like schema = pd.io.sql.get_schema(df.reset_index(), table_name, con=db.engine, keys=cols)

Collectives™ on Stack Overflow

Python Pandas to_sql, how to create a table with a primary key?

5 Answers 5

3 Comments

4 Comments

3 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

4 Comments

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related