In python I am trying to create and write to the table TBL in the database DB in Databricks. But I get an exception: A schema mismatch detected when writing to the Delta table. My code is as follows, here df is a pandas dataframe.
from pyspark.sql import SparkSession
DB = database_name
TMP_TBL = temporary_table
TBL = table_name
sesh = SparkSession.builder.getOrCreate()
df_spark = sesh.createDataFrame(df)
df_spark.createOrReplaceTempView(TMP_TABLE)
create_db_query = f"""
CREATE DATABASE IF NOT EXISTS {DB}
COMMENT "This is a database"
LOCATION "/tmp/{DB}"
"""
create_table_query = f"""
CREATE TABLE IF NOT EXISTS {DB}.{TBL}
USING DELTA
TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true)
COMMENT "This is a table"
LOCATION "/tmp/{DB}/{TBL}";
"""
insert_query = f"""
INSERT INTO TABLE {DB}.{TBL} select * from {TMP_TBL}
"""
sesh.sql(create_db_query)
sesh.sql(create_table_query)
sesh.sql(insert_query)
The code fails at the last line, insert_query line. When I check the database and table have been created but is of course empty. So the problem lies with that the TMP_TBL and TBL have different schemas, how and where do I define the schema so they match?



create table db.tbl (column1 type1, column2 type2, ...) ...