63

I would like to create a MySQL table with Pandas' to_sql function which has a primary key (it is usually kind of good to have a primary key in a mysql table) as so:

group_export.to_sql(con = db, name = config.table_group_export, if_exists = 'replace', flavor = 'mysql', index = False)

but this creates a table without any primary key, (or even without any index).

The documentation mentions the parameter 'index_label' which combined with the 'index' parameter could be used to create an index but doesn't mention any option for primary keys.

Documentation

6
  • 4
    @unutbu I think the index=True just ensures the index is written to the table and that it is an index in sql, and not yet a primary key Commented Jun 16, 2015 at 13:02
  • 1
    Yes, index just uses the row number as an index which is not what I want. Commented Jun 16, 2015 at 13:18
  • 3
    For now, there is not yet support for specifying primary keys (it's on the feature wishlist). Possible workaround to first create the table, and then use the 'append' option in to_sql. To create the table, pd.io.sql.get_schema could be helpful to create the schema (that then can be adapted/executed to create the table) Commented Jun 16, 2015 at 13:24
  • Thanks @joris, you're right index=True makes an index, but not a primary key. Commented Jun 16, 2015 at 13:36
  • @joris, trying to add to existing table gives error: "NOT NULL constraint failed" for 'id INT PRIMARY KEY NOT NULL'. How to fill-in PRIMARY KEY? Commented Jun 19, 2016 at 7:30

5 Answers 5

83

Simply add the primary key after uploading the table with pandas.

group_export.to_sql(con=engine, name=example_table, if_exists='replace', 
                    flavor='mysql', index=False)

with engine.connect() as con:
    con.execute('ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);')
Sign up to request clarification or add additional context in comments.

3 Comments

SQLite does not support this.
@tomp - Thank you so much and add FIRST to bring index to front if needed ALTER TABLE {table_name}` CHANGE index id int(4) NOT NULL auto_increment FIRST;`
Since SQLAlchemy 2.0 you need to wrap the SQL string in sqlalchemy.text, otherwise you get an ObjectNotExecutableError. Example: con.execute(text('ALTER TABLE example_table ADD PRIMARY KEY (ID_column)')). See also the sqlalchemy docs.
37

Disclaimer: this answer is more experimental then practical, but maybe worth mention.

I found that class pandas.io.sql.SQLTable has named argument key and if you assign it the name of the field then this field becomes the primary key:

Unfortunately you can't just transfer this argument from DataFrame.to_sql() function. To use it you should:

  1. create pandas.io.SQLDatabase instance

    engine = sa.create_engine('postgresql:///somedb')
    pandas_sql = pd.io.sql.pandasSQL_builder(engine, schema=None, flavor=None)
    
  2. define function analoguous to pandas.io.SQLDatabase.to_sql() but with additional *kwargs argument which is passed to pandas.io.SQLTable object created inside it (i've just copied original to_sql() method and added *kwargs):

    def to_sql_k(self, frame, name, if_exists='fail', index=True,
               index_label=None, schema=None, chunksize=None, dtype=None, **kwargs):
        if dtype is not None:
            from sqlalchemy.types import to_instance, TypeEngine
            for col, my_type in dtype.items():
                if not isinstance(to_instance(my_type), TypeEngine):
                    raise ValueError('The type of %s is not a SQLAlchemy '
                                     'type ' % col)
    
        table = pd.io.sql.SQLTable(name, self, frame=frame, index=index,
                         if_exists=if_exists, index_label=index_label,
                         schema=schema, dtype=dtype, **kwargs)
        table.create()
        table.insert(chunksize)
    
  3. call this function with your SQLDatabase instance and the dataframe you want to save

    to_sql_k(pandas_sql, df2save, 'tmp',
            index=True, index_label='id', keys='id', if_exists='replace')
    

And we get something like

CREATE TABLE public.tmp
(
  id bigint NOT NULL DEFAULT nextval('tmp_id_seq'::regclass),
...
)

in the database.

PS You can of course monkey-patch DataFrame, io.SQLDatabase and io.to_sql() functions to use this workaround with convenience.

4 Comments

Nice. Thank you. In the end however I found it simpler to just make the table before and append to it.
I was also hoping that the index_label option of to_sql would help.
Great answer, unfortunately it doesn't work with MySQL if the key column is a text type because pandas doesn't seem to have a way to specify the key length. It gives error 1170, "BLOB/TEXT column used in key specification without a key length"
@krvkir Hi, I would like to add primary key to in to_sql function, but when I wrote keys='id' in that function, it says TypeError: to_sql() got an unexpected keyword argument 'keys', do you know the reason why? I use python 3.6.6, postgres 10.3, SQLAlchemy 1.2.15 and pandas 0.23.4, really appreciate for any advice.
22

As of pandas 0.15, at least for some flavors, you can use argument dtype to define a primary key column. You can even activate AUTOINCREMENT this way. For sqlite3, this would look like so:

import sqlite3
import pandas as pd

df = pd.DataFrame({'MyID': [1, 2, 3], 'Data': [3, 2, 6]})
with sqlite3.connect('foo.db') as con:
    df.to_sql('df', con=con, dtype={'MyID': 'INTEGER PRIMARY KEY AUTOINCREMENT'})

3 Comments

OP should promote this as the chosen answer as it works quite well.
@DanSteingart OP asked about MySQL but the string value for dtype is only supported for SQLite. In general, the dtype value needs to be a SQLAlchemy type (such as Integer()) so I think it's just an accident that for SQLite the string is passed through and can be used to set a primary key. In other SQL backends, this generally won't work.
This doesn't work for me - I get the error: ValueError: The type of col is not a SQLAlchemy type.
1
with engine.connect() as con:
    con.execute('ALTER TABLE for_import_ml ADD PRIMARY KEY ("ID");')

for_import_ml is a table name in the database.

Adding a slight variation to tomp's answer (I would comment but don't have enough reputation points).

I am using PGAdmin with Postgres (on Heroku) to check and it works.

Comments

0

automap_base from sqlalchemy.ext.automap (tableNamesDict is a dict with only the Pandas tables):

metadata = MetaData()
metadata.reflect(db.engine, only=tableNamesDict.values())
Base = automap_base(metadata=metadata)
Base.prepare()

Which would have worked perfectly, except for one problem, automap requires the tables to have a primary key. Ok, no problem, I'm sure Pandas to_sql has a way to indicate the primary key... nope. This is where it gets a little hacky:

for df in dfs.keys():
    cols = dfs[df].columns
    cols = [str(col) for col in cols if 'id' in col.lower()]
    schema = pd.io.sql.get_schema(dfs[df],df, con=db.engine, keys=cols)
    db.engine.execute('DROP TABLE ' + df + ';')
    db.engine.execute(schema)
    dfs[df].to_sql(df,con=db.engine, index=False, if_exists='append')

I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.

Now that you have the models, you can explicitly name and use them (i.e. User = Base.classes.user) with session.query or create a dict of all the classes with something like this:

alchemyClassDict = {}
for t in Base.classes.keys():
    alchemyClassDict[t] = Base.classes[t]

And query with:

res = db.session.query(alchemyClassDict['user']).first()

1 Comment

pd.io.sql.get_schema is not in the public interface so not good to rely on it. Also the code will only work if the dataframe doesn't have an index. Otherwise have to use something like schema = pd.io.sql.get_schema(df.reset_index(), table_name, con=db.engine, keys=cols)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.