I have a DB file that has a table with two columns, 'a' and 'b', and about 11 million rows.
When I load the table into a pandas.Dataframe and perform a simple filtering like
df = df[ abs(df['a']-df['b']) > 0.0001 ]
the processing takes less than 500 ms.
However, when I query the db directly in sqlite3 shell like this
SELECT a, b
FROM table
WHERE abs(a-b)>0.0001
The process takes about 3 s. In my actual work I need a more complex query that should produce much larger overhead. In fact, it is needed to change the filtering condition interactively, which means I need to query many times to obtain the finial table.
I know that pandas dataframe is in memory but the table is on disk. Is there a simple way to load tables in memory and filter the entries as fast as the boolean indexing in pandas?