I have a .csv database with names. Every month it is updated and I've created a code that breaks down the names that have left and those that have entered the database, generating two .csv's at the end: a list of those that have been inserted into the database (inserted.csv) and those that have been removed from the database (removed.csv).
In the withdrawals, I have to create a column indicating the exact date on which the code was executed to generate the withdrawals.
An example of what I would like and a hypothetical situation to put it in context: 01/10 I run the code, which returns me the list of those removed from the base + the date of the day I ran it.
The other day, I discovered that the base list had been updated and other names had been removed, so I ran my code again.
That's what I hope to find:
The list of withdrawals with the date of the previous execution plus the names and date of the most recent time I used the code.
I want to be able to have a record of the date I executed the code without overwriting past dates.
I've already tried using this code, but to no avail.
my code:
base = pd.read_csv('cadastro_de_empregadores.csv', sep= ';', encoding="latin-1", skiprows= 5)
copia_base = pd.read_csv('copia_base.csv', sep= ';', encoding="latin-1")
base.dropna(axis = 1, how = 'all', inplace= True)
inseridos = copia_base[~copia_base['CNPJ/CPF'].isin(base['CNPJ/CPF'])]
retirados = base[~base['CNPJ/CPF'].isin(copia_base['CNPJ/CPF'])]
inserted.csvlist, or is there some other logic? Also, just to clarify, theremoved.csvlist should contain the historical data of all names that were removed from the database, and the dates they were removed, right? In essence, you want to append the newly excluded names + the program's execution date to an already existing.csvof excluded names?