1

In python I am trying to create and write to the table TBL in the database DB in Databricks. But I get an exception: A schema mismatch detected when writing to the Delta table. My code is as follows, here df is a pandas dataframe.

from pyspark.sql import SparkSession

DB = database_name
TMP_TBL = temporary_table
TBL = table_name

sesh = SparkSession.builder.getOrCreate()
df_spark = sesh.createDataFrame(df)
df_spark.createOrReplaceTempView(TMP_TABLE)

create_db_query = f"""
    CREATE DATABASE IF NOT EXISTS {DB}
    COMMENT "This is a database"
    LOCATION "/tmp/{DB}"
    """

create_table_query = f"""
    CREATE TABLE IF NOT EXISTS {DB}.{TBL}
    USING DELTA
    TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true)
    COMMENT "This is a table"
    LOCATION "/tmp/{DB}/{TBL}";
    """

insert_query = f"""
    INSERT INTO TABLE {DB}.{TBL} select * from {TMP_TBL}
    """

sesh.sql(create_db_query)
sesh.sql(create_table_query)
sesh.sql(insert_query)

The code fails at the last line, insert_query line. When I check the database and table have been created but is of course empty. So the problem lies with that the TMP_TBL and TBL have different schemas, how and where do I define the schema so they match?

1
  • shouldn't you pass the table schema while creating the empty table? create table db.tbl (column1 type1, column2 type2, ...) ... Commented Sep 1 at 8:57

1 Answer 1

-1

If the schema in your table is different from the schema that you have inserted in your data frame, then you will get an error. make sure it should be same performing insert operation and also try this approach:

I reproduce same thing in my environment. I got this output.

ddl_query = """CREATE TABLE if not exists test123.emp_file 
                   USING DELTA
                   LOCATION 'dbfs:/user/dem1231'
                   """
spark.sql(ddl_query)

insert_query = f"""
    INSERT INTO TABLE test123.emp_file select * from temp_table
    """

enter image description here

Or

Try this Alternative approach to insert data into table.

I have a data frame like this with a predefined schema

from pyspark.sql.types import StructType,StructField, StringType, IntegerType

#sample datafram
data = [
            ("vamsi","1","M",2000),
            ("saideep","2","M",3000),
            ("rakesh","3","M",4000)
          ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])

df = spark.createDataFrame(data=data,schema=schema)

After using the write command with append mode directly you can insert it into the SQL table.

df.write.mode("append").format("delta").saveAsTable("DB.TBL")

enter image description here

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.