I'm doing big batch inserts into an SQLite3 database and I'm trying to get a sense for what sort of performance I should be expecting versus what I'm actually seeing.
My table looks like this:
cursor.execute(
"CREATE TABLE tweets(
tweet_hash TEXT PRIMARY KEY ON CONFLICT REPLACE,
tweet_id INTEGER,
tweet_text TEXT)"
)
and my inserts look like this:
cursor.executemany("INSERT INTO tweets VALUES (?, ?, ?)", to_write)
where to_write is a list of tuples.
Currently, with about 12 million rows in the database, inserting 50 000 rows is taking me around 16 minutes, running on a 2008 macbook.
Does this sound reasonable, or is there something gross happening?
executemanyis supposed to take advantage of preparing the query and streaming in the per-row columns already.execute()andexecutemany()only allow for one statement.for i in range(0,len(my_list),100):execute_many(qry,my_list[i:i+100])at a guessexecutemanyuse transactions? If not, sqlite will internally wrap each and every insert statement with an implicit transaction, which can cause a huge performance hit on bulk inserts. See these for a little more info: sqlite.org/faq.html#q19 or stackoverflow.com/questions/3852068/sqlite-insert-very-slow