2

I've got an MS Access table (SearchAdsAccountLevel) which needs to be updated frequently from a python script. I've set up the pyodbc connection and now I would like to UPDATE/INSERT rows from my pandas df to the MS Access table based on whether the Date_ AND CampaignId fields match with the df data.

Looking at previous examples I've built the UPDATE statement which uses iterrows to iterate through all rows within df and execute the SQL code as per below:

    connection_string = (
            r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"
            r"c:\AccessDatabases\Database2.accdb;"
    )
    cnxn = pyodbc.connect(connection_string, autocommit=True)
    crsr = cnxn.cursor()

    for index, row in df.iterrows():
            crsr.execute("UPDATE SearchAdsAccountLevel SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?, [AppName]=?, [AppId]=?, [TotalBudgetAmount]=?, [TotalBudgetCurrency]=?, [DailyBudgetAmount]=?, [DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?, [ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?, [LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?, [Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?, [RowUpdatedTime]=? WHERE [Date_]=? AND [CampaignId]=?",
                        row['OrgId'],
                        row['CampaignName'],
                        row['CampaignStatus'],
                        row['Storefront'],
                        row['AppName'],
                        row['AppId'],
                        row['TotalBudgetAmount'],
                        row['TotalBudgetCurrency'],
                        row['DailyBudgetAmount'],
                        row['DailyBudgetCurrency'],
                        row['Impressions'],
                        row['Taps'],
                        row['Conversions'],
                        row['ConversionsNewDownloads'],
                        row['ConversionsRedownloads'],
                        row['Ttr'],
                        row['LocalSpendAmount'],
                        row['LocalSpendCurrency'],
                        row['ConversionRate'],
                        row['Week_'],
                        row['Month_'],
                        row['Year_'],
                        row['Quarter'],
                        row['FinancialYear'],
                        row['RowUpdatedTime'],
                        row['Date_'],
                        row['CampaignId'])
crsr.commit()

I would like to iterate through each row within my df (around 3000) and if the ['Date_'] AND ['CampaignId'] match I UPDATE all other fields. Otherwise I want to INSERT the whole df row in my Access Table (create new row). What's the most efficient and effective way to achieve this?

2
  • There is a lot of ways and steps to affront your quetion. 1- Dont execute queries one by one, use parametrizable query. 2- Use yield's in python ... Commented May 2, 2019 at 15:32
  • @Ilmari - Done. Thanks for the heads-up. Commented Jun 4, 2019 at 19:00

1 Answer 1

3

Consider DataFrame.values and pass list into an executemany call, making sure to order columns accordingly for the UPDATE query:

cols = ['OrgId', 'CampaignName', 'CampaignStatus', 'Storefront',
        'AppName', 'AppId', 'TotalBudgetAmount', 'TotalBudgetCurrency',
        'DailyBudgetAmount', 'DailyBudgetCurrency', 'Impressions',
        'Taps', 'Conversions', 'ConversionsNewDownloads', 'ConversionsRedownloads',
        'Ttr', 'LocalSpendAmount', 'LocalSpendCurrency', 'ConversionRate',
        'Week_', 'Month_', 'Year_', 'Quarter', 'FinancialYear',
        'RowUpdatedTime', 'Date_', 'CampaignId']

sql = '''UPDATE SearchAdsAccountLevel 
            SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?, 
                [AppName]=?, [AppId]=?, [TotalBudgetAmount]=?, 
                [TotalBudgetCurrency]=?, [DailyBudgetAmount]=?, 
                [DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?, 
                [ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?, 
                [LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?,
                [Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?, 
                [RowUpdatedTime]=? 
          WHERE [Date_]=? AND [CampaignId]=?'''

crsr.executemany(sql, df[cols].values.tolist())   
cnxn.commit()

For the insert, use a temp, staging table with exact structure as final table which you can create with make-table query: SELECT TOP 1 * INTO temp FROM final. This temp table will be regularly cleaned out and inserted with all data frame rows. The final query migrates only new rows from temp into final with NOT EXISTS, NOT IN, or LEFT JOIN/NULL. You can run this query anytime and never worry about duplicates per Date_ and CampaignId columns.

# CLEAN OUT TEMP
sql = '''DELETE FROM SearchAdsAccountLevel_Temp'''
crsr.executemany(sql)   
cnxn.commit()

# APPEND TO TEMP
sql = '''INSERT INTO SearchAdsAccountLevel_Temp (OrgId, CampaignName, CampaignStatus, Storefront,
                                AppName, AppId, TotalBudgetAmount, TotalBudgetCurrency,
                                DailyBudgetAmount, DailyBudgetCurrency, Impressions,
                                Taps, Conversions, ConversionsNewDownloads, ConversionsRedownloads,
                                Ttr, LocalSpendAmount, LocalSpendCurrency, ConversionRate,
                                Week_, Month_, Year_, Quarter, FinancialYear,
                                RowUpdatedTime, Date_, CampaignId)    
         VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, 
                 ?, ?, ?, ?, ?, ?, ?, ?, ?, 
                 ?, ?, ?, ?, ?, ?, ?, ?, ?);'''

crsr.executemany(sql, df[cols].values.tolist())   
cnxn.commit()

# MIGRATE TO FINAL
sql = '''INSERT INTO SearchAdsAccountLevel 
         SELECT t.* 
         FROM SearchAdsAccountLevel_Temp t
         LEFT JOIN SearchAdsAccountLevel f
            ON t.Date_ = f.Date_ AND t.CampaignId = f.CampaignId
         WHERE f.OrgId IS NULL'''
crsr.executemany(sql)   
cnxn.commit()
Sign up to request clarification or add additional context in comments.

11 Comments

how can I also integrate an insert query to create rows which are not currently present in my table but are in df?
That is a different question than this as it involves a different query and data handling. Usually, staging temp tables are used to avoid append duplicates with NOT IN, NOT EXISTS, LEFT JOIN/IS NULL clauses against final tables in an INSERT...SELECT call.
+1 for the df[cols].values hint. To be fair, the "insert" aspect was part of the question (although buried near the bottom), and that is covered by this answer via a comment to the dup.
One or more of your data frame column types do not match database table types. Try importing a select query of few rows with read_sql and compare dtypes with your current df.
You need to explicitly name the columns in last append query's INSERT and SELECT clauses. For latter, do not use the abbreviated *.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.