I have to compare two data sources to see if the same record is the same across all rows. One data source comes from an Excel File, where another comes from a SQL Table. I tried using DataFrame.equals() Like i have in the past.
However, the issue is due to pesky datatype issues. Even though the data looks the same, the datatypes are making excel_df.loc[excel_df['ID'] = 1].equals(sql_df.loc[sql_df['ID'] = 1]) return False. Here is an example of the datatype from pd.read_excel():
COLUMN ID int64
ANOTHER Id float64
SOME Date datetime64[ns]
Another Date datetime64[ns]
The same columns from pd.read_sql:
COLUMN ID float64
ANOTHER Id float64
SOME Date object
Another Date object
I could try using the converters argument from pd.read_excel() to match SQL. Or also doing df['Column_Name] = df['Column_Name].astype(dtype_here) But I am dealing with a lot of columns. Is there an easier way to check for values across all columns?
checking pd.read_sql() there is no thing like converters but I'm looking for something like:
df = pd.read_sql("Select * From Foo", con, dtypes = ({Column_name: str,
Column_name2:int}))