I am not very experienced with coding but I am creating a customtkinter application style script where a user can input a specific type of html that contains diagnostic addresses and various information attributed to that address, and the script will parse through it and return the selected address/information as a dictionary for further use.
The code works but the HTMLs can range from ~10000 to ~70000 lines in length and it will take over a minute to read through the larger HTMLs. I know there inefficiencies in my code so I am trying to figure out ways to reduce the waiting time while the script runs. I figure my biggest bottlenecks are the repeated creation of dataframes and the nested for loop afterwards. I have considered creating only one dataframe and iterating through it but I am unsure of the impact it would make.
How can I write this in a way the improves the runtime?
Here is the function:
# Clear list of previous selections
fv.info_values_to_add.clear()
# Read user's info selections
filter_info_selections()
# Open and read protocol file
with open(file_name, 'r') as file:
contents = file.read()
# Global variable to used in export functions
global Length_Of_Info_1
Length_Of_Info_1 = len(fv.info_values_to_add)
# Create a soup object from the protocol
parsed_protocol = BeautifulSoup(contents, "html.parser")
for address, address_value in fv.protocol_values_1.items():
# String to be used to find the correct section of the html
string_address = "ECU: " + (address)
try:
# Find the header for the parsed address
table = parsed_protocol.find('p', string = re.compile(string_address))
# Select the correct table for the information
data_table = table.find_all_next('table')
# Create a dataframe from the table
data_frame = pd.read_html(io.StringIO(str(data_table)))[1]
# Clean data frame columns and values
df_clean = data_frame.drop(columns=2, axis=1)
# Save selected data to variables to be used
sw_version = df_clean.iloc[1,1]
hw_part_number = df_clean.iloc[2,1]
hw_version = df_clean.iloc[3,1]
vehicle_vin = df_clean.iloc[20,1]
fazit_id = df_clean.iloc[21,1]
coding = df_clean.iloc[7,1]
vw_part_number = df_clean.iloc[0,1]
# List to store variables to be added to the fv.protocol_values_1 dictionary
temp_list = []
# Iterate through the info list and add the selected variables
for key in fv.info_values_to_add:
if key == "Software Version":
temp_list.append(sw_version)
elif key == "Hardware part number":
temp_list.append(hw_part_number)
elif key == "Hardware Version":
temp_list.append(hw_version)
elif key == "Fazit ID":
temp_list.append(fazit_id)
elif key == "VIN Number":
temp_list.append(vehicle_vin)
elif key == "Coding":
temp_list.append(coding)
elif key == "VW part number":
temp_list.append(vw_part_number)
else:
pass
# Add values to the address in the dictionary
fv.protocol_values_1[address] = temp_list