When loading a csv file in pandas I've encountered the bellow error message:
DtypeWarning: Columns have mixed types. Specify dtype option on import
or set low_memory=False
Reading online I found few solutions.
One, to set low_memory=False, but I understand that this is not a good practice and it doesn't really resolve the problem.
Second solution is to set a data type for each column (or each column with mixed data types):
pd.read_csv(csv_path_name, dtype={'first_column': 'str', 'second_column': 'str'})
Again, from what I read, not the ideal solution if we have a big dataset.
Third solution - create a converter function. To my understanding this might be the most appropriate solution. I found code which works for me, but I am trying to better understand what is this function exactly doing:
def convert_dtype(x):
if not x:
return ''
try:
return str(x)
except:
return ''
df = pd.read_csv(csv_path_name, converters={'first_col':convert_dtype, 'second_col':convert_dtype, etc.... } )
Can someone please explain the function code to me?
Thanks