Python For Loop + pandas append

Question

I am trying to read in files in a loop and append them all into one dataset. However my code seems to be reading the data in fine, but the loop is not appending the data to a dataframe. Instead it just uses one of the imported datasets (final_Access hr dataframe).

What is wrong with my loop? why arent my looped files being appended? My dataframe access_HR_attestaion has 77 records, when I am expecting 2639 records as I am reading in 3 files.

for file in files_path:
    mainframe_access_HR = pd.read_pickle(file)
    mainframe_access_HR = mainframe_access_HR.astype(str)
    
    if mainframe_access_HR.shape[0]: 
        
        application = mainframe_access_HR['Owner'].unique()[0]
    

        filtered_attestation_data = attestation_data[attestation_data['cleaned_MAL_CODE']==application]


        final_access_hr = pd.DataFrame()
        column_list = pd.DataFrame(['HRACF2']) 
        for column in range(len(column_list)):
            mainframe_access_HR_new = mainframe_access_HR.copy()

            #Drop rows containing NAN for column c_ACF2_ID for new merge
            mainframe_access_HR_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)
        
            #Creating a new column for merge
            mainframe_access_HR_new['ID'] = mainframe_access_HR_new[column_list.iloc[column,0]]
            
            #case folding
            mainframe_access_HR_new['ID'] = mainframe_access_HR_new['ID'].str.strip().str.upper()
        
            #Merge data
            merged_data = pd.merge(filtered_attestation_data, mainframe_access_HR_new, how='right', left_on=['a','b'], right_on =['a','b'])

        
            #Concatinating all data together
            final_access_hr = final_access_hr.append(merged_data)

        #Remove duplicates
        access_HR_attestaion = final_access_hr.drop_duplicates()

append is unfortunately deprecated but that just means one should collect the parts into a list and then concat them all at the end. — ramslök
– ramslök, Commented Jun 26, 2022 at 17:28
@creanion thanks! didnt know it was deprecated. anychance you could help with the concat list part? — Jonnyboi
– Jonnyboi, Commented Jun 26, 2022 at 17:39

Ravindra S · Accepted Answer · 2022-06-26 17:27:56Z

1

I think the bug is because you are initializing final_access_hr for the every file you are reading. So that is getting reset for every file you read.

Can you move following line out the loop of files_path:

final_access_hr = pd.DataFrame()

and comment if it solves your problem?

answered Jun 26, 2022 at 17:27

Ravindra S

6,48212 gold badges74 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jonnyboi Over a year ago

I have put it before the files_path loop, and now im getting NameError: name 'final_access_hr' is not defined

Collectives™ on Stack Overflow

Python For Loop + pandas append

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related