In my Pyspark code I am performing more than 10 join operations and multiple groupBy in between. I want to avoid a large DAG and so I decided to save the dataframe as a table to avoid re-computations. As a result I created a database and started saving my dataframe inside that.
After performing 5 join operations and some groupBy operations I saved the table using the below command and everything till here ran successfully.
spark.sql("DROP TABLE IF EXISTS half_yearly_data")
half_yearly_data.write.saveAsTable("half_yearly_data")
half_yearly_data = spark.read.table('half_yearly_data')
Later on after performing the remaining join's and groupBy's I am running the following statement which gives me an error
spark.sql("DROP TABLE IF EXISTS db.half_yearly_data")
half_yearly_data.write.saveAsTable("db.half_yearly_data") # Error pointing here
half_yearly_data = spark.read.table('db.half_yearly_data')
Error is pointing to the 2nd line as: The schema of your Delta table has changed in an incompatible way since your DataFrame or DeltaTable object was created. Please redefine your DataFrame or DeltaTable object.
I have not defined my table as a delta table, still it gives me an error related to delta table. Then I tried the following command
spark.sql("DROP TABLE IF EXISTS db.half_yearly_data")
half_yearly_data.write.mode("overwrite").option("overwriteSchema","true").saveAsTable("db.half_yearly_data") # Error pointing here
half_yearly_data = spark.read.table('db.half_yearly_data')
Still the same error. I understand that when I try to convert my data frame into a table the 2nd time there are new columns and some schema changes from the 1st creation. But I am dropping the table before creating it again. I am wondering what can I do here.
Since error was pointing to 2nd line I checked if the table was dropped from the database using the below command and the table does not exist in the database.
spark.sql("show tables in db").show()
I tried to save the data with different table name and the same error pop's up. Although the new table does not exists.
In built AI generated suggestions from Databricks notebooks are pointing to Delta Table but I am not using a delta table here. How to overwrite or create my table again the 2nd time?