Update data to mysql if row does not exists using Python

Question

Context: I have a table in mysql database which has the format like this. Every row is one day stock price and volume data

Ticker,Date/Time,Open,High,Low,Close,Volume
AAA,7/15/2010,19.581,20.347,18.429,18.698,174100
AAA,7/16/2010,19.002,19.002,17.855,17.855,109200
BBB,7/19/2010,19.002,19.002,17.777,17.777,104900
BBB,7/19/2010,19.002,19.002,17.777,17.777,104900
CCC,7/19/2010,19.002,19.002,17.777,17.777,104900
....100000 rows

This table is created by importing the data from multiple *.txt file with the same column and format. The *.txt file name is the same with the ticker name in ticker column: ie: import AAA.txt get me the 2 rows of AAA data.

All these *.txt file is generated automatically by a system that retrieve stock price in my country. Every day, after the stock market close, the .txt file will have one new row according to the data of the new day.

Question: everyday, how could I update the new row in each txt file into the database, I do not want to load all the data in the .txt file in mysql table everyday because it take a lot of time, I only want to load new rows.

How should I write the code to do this updating mission.

cron job to run the script daily crossed with: when you open a file, instead of opening it and starting from the beginning, you could open it, and have the file pointer be offset from the end of the file to read the last line only? — Fallenreaper
– Fallenreaper, Commented Apr 26, 2017 at 17:46
This solution is not very effective, because not everytime it is loading the last line. If I do not update the table daily, ie some time I update data every 3 days, some time every 5days, so I must change the code everytime to load the correct new row. Are there anyway to ignore the existing row and only append new rows — Anh Hoang
– Anh Hoang, Commented Apr 26, 2017 at 17:53
assuming that the txt files are read only, what about saving the end pointer location somewhere so it can pick up where it left off? — Fallenreaper
– Fallenreaper, Commented Apr 26, 2017 at 22:55
before talking about how to eliminate rows before load, what software/command/etc are you using to load the data into mysql? i'm thinking that perhaps using a bulk load operation into an empty partition/copy, then join and eliminate / update. Constraint checking and index maintanence row by row may be whats taking most of your time depending on how you are loading. Bulk operations could solve your problem. — Xingzhou Liu
– Xingzhou Liu, Commented Jul 21, 2017 at 5:12

Xingzhou Liu · Accepted Answer · 2017-07-21 05:37:28Z

(1) Create/use an empty stage table, no prmary ... :

 create table db.temporary_stage (
    ... same columns as your orginial table , but no constraints or keys or an index ....

 )

(2) # this should be really fast

  LOAD DATA INFILE 'data.txt' INTO TABLE db.temporary_stage;

(3) join on id then use a hash function to eliminate all rows that haven't changed. the following can be made better, but all in all using bulk loads against databases is a lot faster when you have lots of rows, and thats mostly down to how the database moves stuff about internally. it can do upkeep much more efficiently all at once than a little at a time.

   UPDATE mytable SET 
           mytable... = temporary_stage...
           precomputed_hash = hash(concat( .... ) )
   FROM
   ( 
            SELECT temporary_stage.* from mytable join 
               temporary_stage on mytable.id = temporary_state.id
               where  mytable.pre_computed_hash != hash(concat( .... ) ) ) 
     AS new_data on mytable.id = new_data.id

# clean up

DELETE FROM temporary_stage;

Collectives™ on Stack Overflow

Update data to mysql if row does not exists using Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related