1

I am trying to create an sqlite database with pandas.

I am able to save the data with:

from sqlalchemy import create_engine                                                                                                                                                                               
from sqlalchemy.orm import sessionmaker                                                                                                                                                                            
engine = create_engine(path, echo=False)                                                                                                                                                                           
df_flows.to_sql('flows', engine, if_exists='append', index=False, index_label='First')                                                                                                                         

and I can read it back with

df = pd.read_sql("SELECT * FROM flows WHERE First>1504101810 AND First<1504105409", engine)                                                                                                                        

The data is on disk but I think the indexing is not working properly as:

In [22]: from sqlalchemy.engine import reflection                                                                                                                                                                  

In [23]: insp = reflection.Inspector.from_engine(engine)                                                                                                                                                           

In [24]: insp.get_indexes('flows')                                                                                                                                                                                 
Out[24]: []   

Now I have 2 questions:

1) Why the column First does not appear with insp.get_indexes('flows')

2) How can I add 1 or more indexes to the database that I have created.

EDIT:

This is the structure of the data frame

In [25]: df_flows.dtypes                                                                                                                                                                                    
Out[25]:                                                                                                                                                                                                    
Protocol        object                                                                                                                                                                                      
Src             object                                                                                                                                                                                      
SrcPort        float64                                                                                                                                                                                      
Dst             object                                                                                                                                                                                      
DstPort        float64                                                                                                                                                                                      
Group ID         int64                                                                                                                                                                                      
Port            object                                                                                                                                                                                      
VPort            int64                                                                                                                                                                                      
IP TOS          object                                                                                                                                                                                      
VLAN ID        float64                                                                                                                                                                                      
VLAN Pri       float64                                                                                                                                                                                      
MPLS Exp       float64                                                                                                                                                                                      
Application     object                                                                                                                                                                                      
Packets          int64                                                                                                                                                                                      
Messages         int64                                                                                                                                                                                      
Bytes            int64                                                                                                                                                                                      
First            int64                                                                                                                                                                                      
Last             int64                                                                                                                                                                                      
SrcSubnet       object                                                                                                                                                                                      
DstSubnet       object                                                                                                                                                                                      
dtype: object              

1 Answer 1

0

You don't show the structure of your dataframe, so it is difficult to answer your question. However, given your inputs, I can make some inferences.

When you save your database to sql, you set index=False. This means that any index on your dataframe is not saved as a column in the database. You then go on to assign an index label which will have no affect given that you set index to False. This parameter used if you want to rename your existing index name.

index : boolean, default True Write DataFrame index as a column.

index_label : string or sequence, default None Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

An database index would have to be created through the database, not pandas.

Sign up to request clarification or add additional context in comments.

9 Comments

so you are saying that if index_label is not None then index must be set to True and it is automatically replaced with the specified column name?
No. I am saying index_label is ignored if index=False. If index=True (the default value), it will write the index as a column using its name. You can use another name, however, using the index_label parameter.
ok but what if I want to add indices later of if I want to index by mulitple columns?
My aim is to have a database where I can quickly search on the variables First and Src ( the first one is numeric and the second one is string)
The index in a pandas dataframe has nothing to do with the database index. You would need to build the index yourself using your database package. The index parameter is merely asking if you want to write the dataframe index as columns in the database (this wouldn't be needed, for example, if the index is just the ordered range, e.g. 0, 1, 2, ... n).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.