1

Im new to python (3) and would like to now the following:

I'm trying to collect data via pandas from a website and would like to store the results into a mysql database like:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("mysql://python:"+'pw'+"@localhost/test?charset=utf8")

url = r'http://www.boerse-frankfurt.de/devisen'
dfs = pd.read_html(url,header=0,index_col=0,encoding="UTF-8")
devisen = dfs[9] #Select the right table
devisen.to_sql(name='table_fx', con=engine, if_exists='append', index=False)

I'm receiving the following error:

.... _mysql.connection.query(self, query) sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tBezeichnung\n\t\t\t\t\t\t\t\n\t\t\t\t' in 'field list'") [SQL: 'INSERT INTO tbl_fx (\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tBezeichnung\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tzum Vortag\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tLetzter Stand\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tTageshoch\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tTagestief\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t52-Wochenhoch\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t52-Wochentief\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tDatum\n\t\t\t\t\t\t\t\n\t\t\t\t, \nAktionen\t\t\t\t) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)'] [parameters: (('VAE Dirham', '-0,5421%', 45321.0, 45512.0, 45306.0, 46080.0, 38550.0, '20.02.2018 14:29:00', None), ('Armenischer Dram', '-0,0403%', 5965339.0, 5970149.0, 5961011.0, 6043443.0, 5108265.0, '20.02.2018 01:12:00', None), ....

How can sqlalchemy INSERT respective data into table_fx? Problem is the header with the multiple \n and \t.

The mysql table hase the following structur:

( name varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL, bezeichnung varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL, diff_vortag varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL, last double DEFAULT NULL, day_high double DEFAULT NULL, day_low double DEFAULT NULL, 52_week_high double DEFAULT NULL, 52_week_low double DEFAULT NULL, date_time varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL, unnamed varchar(200) COLLATE utf8_unicode_ci DEFAULT NULL )

Any help is higly welcome.

Thank you very much in advance

Andreas

3
  • your dataframe columns seem to contain many tabs in front of and after their real names; thus not matching the db columns Commented Feb 20, 2018 at 14:05
  • the problem isn't in the data, but the column name. Mysql limits to 64 characters for column names. You need to cleans your data. Commented Feb 20, 2018 at 14:23
  • Thank you very much for the comments. Any idea how to clean the header and replace all \t and \n ? Sorry, I'm still learning. Thank you very much in advance - Andreas Commented Feb 20, 2018 at 15:11

1 Answer 1

1

This should do it. If you convert to a dataframe you can rename columns first. The "dfs" entity you were creating was actually a list of dataframe entities.

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("mysql://python:"+'pw'+"@localhost/test?charset=utf8")

url = r'http://www.boerse-frankfurt.de/devisen'
dfs = pd.read_html(url,header=0,index_col=0,encoding="UTF-8")

devisen = dfs[9].dropna(axis=0, thresh=4) # Select right table and make a DF

devisen.columns = devisen.columns.str.strip() # Strip extraneous characters

devisen.to_sql(name='table_fx', con=engine, if_exists='append', index=False)
Sign up to request clarification or add additional context in comments.

1 Comment

Great! That is what I'm looking for. Thank you very much for all your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.