I am trying to insert data from HBase into Teradata using PySpark. The data is read into a Spark DataFrame and inserts successfully when I limit the DataFrame to 3000–5000 rows like this:
df = df.limit(3000)
However, when I try to insert without limiting, and instead set the Teradata batchsize to 1000, I get this error:
java.sql.BatchUpdateException: [Teradata JDBC Driver] [TeraJDBC 17.10.00.27]
[Error 1338] [SQLState HY000] A failure occurred while executing a PreparedStatement batch request.
The parameter set was not executed and should be resubmitted individually using the PreparedStatement executeUpdate method.
Details:
Complete Code snippet:
df = df.toDF(*renamed_columns)
df = df.withColumnRenamed("data_ROW", "ROW")
df.show(1, False)
df = df.limit(5)
teraDataIp = configFile["teraDataDevIP"]
teraDataBaseName = configFile["teraDataDbName"]
jdbc_url = "jdbc:teradata://{}/DATABASE={},tmode=ANSI,charSet=UTF8,SSLMODE=DISABLE".format(teraDataIp, teraDataBaseName)
# teradata_table = configFile["giskITDTableName"]
if "1" in hbaseTableStr:
teradata_table = "____"
elif "2" in hbaseTableStr:
teradata_table = "____"
elif "3" in hbaseTableStr:
teradata_table = "____"
elif "4" in hbaseTableStr:
teradata_table = "____"
print("Teradata Table Name is: ", teradata_table)
df = df.repartition(10)
df.write.format("jdbc") \
.mode("append") \
.option("driver", "com.teradata.jdbc.TeraDriver") \
.option("url", jdbc_url) \
.option("user", ____) \
.option("password", ____) \
.option("dbtable", teradata_table) \
.option("batchsize", 1000).save()
Execution command:
spark-submit --conf "spark.driver.extraClassPath=[REDACTED]/hbase/lib/*" \
--jars [REDACTED]/hbase_connectors/hbase-spark-protocol-shaded.jar,\
[REDACTED]/lib/terajdbc4.jar,\
[REDACTED]/tdgssconfig.jar \
[REDACTED]/sparkHbase_v2.py
Full error stack trace:
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
Caused by: com.teradata.jdbc.jdbc_4.util.JDBCException
[Teradata Database] [TeraJDBC 17.10.00.27] [Error 1338] [SQLState HY000]
A failure occurred while executing a PreparedStatement batch request.
The presentation of the failure can be found in the exception chain which is accessible with getNextException().
Details of the failure can be found in the exception chain which is accessible with getNextException().
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:198)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.batchUpdateRowCount(StatementReceiveState.java:1406)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.batchUpdateRowCount(StatementReceiveState.java:1389)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.batchUpdateRowCount(StatementReceiveState.java:1371)
Caused by:
org.apache.spark.SparkException Job aborted due to stage failure:
Task 0 in stage 1 (TID 1) failed for unknown reasons
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
Caused by:
java.sql.BatchUpdateException Batch entry 0 insert into table_name values (?, ?, ?) was aborted.
Call getNextException() to see other errors in the batch.
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.run(DataSourceRDD.scala$anon$1.scala$166)
What I tried:
- Inserting smaller datasets works fine
- Checked Teradata DB permissions — no issues
Question: Why does the insert work for small limits but fail when using batchsize for larger data, and how can I properly insert large DataFrames from PySpark to Teradata without hitting this BatchUpdateException?
.option("flatten","on")to see the nested exceptions, then edit this question or ask another.