PySpark to Teradata insert fails with java.sql.BatchUpdateException when using batchsize

Ask Question

Asked 3 months ago

Modified 3 months ago

Viewed 66 times

I am trying to insert data from HBase into Teradata using PySpark. The data is read into a Spark DataFrame and inserts successfully when I limit the DataFrame to 3000–5000 rows like this:

df = df.limit(3000)

However, when I try to insert without limiting, and instead set the Teradata batchsize to 1000, I get this error:

java.sql.BatchUpdateException: [Teradata JDBC Driver] [TeraJDBC 17.10.00.27] 
[Error 1338] [SQLState HY000] A failure occurred while executing a PreparedStatement batch request. 
The parameter set was not executed and should be resubmitted individually using the PreparedStatement executeUpdate method.

Details:

Complete Code snippet:

df = df.toDF(*renamed_columns)
df = df.withColumnRenamed("data_ROW", "ROW")
df.show(1, False)
df = df.limit(5)

teraDataIp = configFile["teraDataDevIP"]
teraDataBaseName = configFile["teraDataDbName"]
jdbc_url = "jdbc:teradata://{}/DATABASE={},tmode=ANSI,charSet=UTF8,SSLMODE=DISABLE".format(teraDataIp, teraDataBaseName)

# teradata_table = configFile["giskITDTableName"]

if "1" in hbaseTableStr:
    teradata_table = "____"
elif "2" in hbaseTableStr:
    teradata_table = "____"
elif "3" in hbaseTableStr:
    teradata_table = "____"
elif "4" in hbaseTableStr:
    teradata_table = "____"

print("Teradata Table Name is: ", teradata_table)
df = df.repartition(10)

df.write.format("jdbc") \
    .mode("append") \
    .option("driver", "com.teradata.jdbc.TeraDriver") \
    .option("url", jdbc_url) \
    .option("user", ____) \
    .option("password", ____) \
    .option("dbtable", teradata_table) \
    .option("batchsize", 1000).save()

Execution command:

spark-submit --conf "spark.driver.extraClassPath=[REDACTED]/hbase/lib/*" \
--jars [REDACTED]/hbase_connectors/hbase-spark-protocol-shaded.jar,\
[REDACTED]/lib/terajdbc4.jar,\
[REDACTED]/tdgssconfig.jar \
[REDACTED]/sparkHbase_v2.py

Full error stack trace:

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

Caused by: com.teradata.jdbc.jdbc_4.util.JDBCException 
[Teradata Database] [TeraJDBC 17.10.00.27] [Error 1338] [SQLState HY000] 
A failure occurred while executing a PreparedStatement batch request. 
The presentation of the failure can be found in the exception chain which is accessible with getNextException().
Details of the failure can be found in the exception chain which is accessible with getNextException().
    at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:198)
    at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.batchUpdateRowCount(StatementReceiveState.java:1406)
    at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.batchUpdateRowCount(StatementReceiveState.java:1389)
    at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.batchUpdateRowCount(StatementReceiveState.java:1371)

Caused by:
org.apache.spark.SparkException Job aborted due to stage failure:
Task 0 in stage 1 (TID 1) failed for unknown reasons
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)

Caused by:
java.sql.BatchUpdateException Batch entry 0 insert into table_name values (?, ?, ?) was aborted.
Call getNextException() to see other errors in the batch.
    at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.run(DataSourceRDD.scala$anon$1.scala$166)

What I tried:

Inserting smaller datasets works fine
Checked Teradata DB permissions — no issues

Question: Why does the insert work for small limits but fail when using batchsize for larger data, and how can I properly insert large DataFrames from PySpark to Teradata without hitting this BatchUpdateException?

edited Aug 11 at 10:38

asked Aug 11 at 7:15

Muhammad Affan

15514 bronze badges

Please do not upload images of code/data/errors. Instead, edit your question to include the code as properly formatted text.

seenukarthi
– seenukarthi

2025-08-11 07:57:33 +00:00
Commented Aug 11 at 7:57
1

This exception says one or more of the rows in the batch was not inserted successfully. The reason or reasons for that would be in the exception chain. Try setting .option("flatten","on") to see the nested exceptions, then edit this question or ask another.

Fred
– Fred

2025-08-11 16:06:43 +00:00
Commented Aug 11 at 16:06
Just an observation - JDBC 17.10 .00.27 is quite old at this point. That's probably not the issue here, but you should consider upgrading.

Fred
– Fred

2025-08-11 16:08:17 +00:00
Commented Aug 11 at 16:08
Hi, I have used .option("flatten","on") as well, but getting same error with that too

Muhammad Affan
– Muhammad Affan

2025-08-12 13:58:21 +00:00
Commented Aug 12 at 13:58
Once you include the flatten, the actual error should be down further in the stack trace

Andrew
– Andrew

2025-08-13 16:02:07 +00:00
Commented Aug 13 at 16:02

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

PySpark to Teradata insert fails with java.sql.BatchUpdateException when using batchsize

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest