Spark SQL: INSERT INTO statement syntax

Question

While reading the Datastax docs for supported syntax of Spark SQL, I noticed you can use INSERT statements like you would normally do:

INSERT INTO hello (someId,name) VALUES (1,"hello")

Testing this out in a Spark 2.0 (Python) environment and a connection to a Mysql database, throws the error:

File "/home/yawn/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException: 
u'\nmismatched input \'someId\' expecting {\'(\', \'SELECT\', \'FROM\', \'VALUES\', \'TABLE\', \'INSERT\', \'MAP\', \'REDUCE\'}(line 1, pos 19)\n\n== SQL ==\nINSERT INTO hello (someId,name) VALUES (1,"hello")\n-------------------^^^\n'

However if I remove the explicit column definition, it works as expected:

INSERT INTO hello VALUES (1,"hello")

Am I missing something?

As I know, spark sql is based on Hive SQL syntax and Language Manual DML for hive says "Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to." so probably it does not make sense to provide columns from spark sql point of view. — VladoDemcak
– VladoDemcak, Commented Oct 23, 2016 at 18:45
@VladoDemcak well, it make sense to me from the readibility point of view, whether or not is necessary to provide a value for every column. Anyway, does this mean that the Datastax docs misplaced that particular information? — TMichel
– TMichel, Commented Oct 25, 2016 at 12:30
Probably Datastax docs misplaced - databricks documentation says only this is possible — VladoDemcak
– VladoDemcak, Commented Oct 27, 2016 at 9:55
I have the same problem, I wanna do "INSERT INTO travelTable (ClientID,SendID,SubscriberKey,EmailAddress,SubscriberID,ListID,EventType,BounceCategory,SMTPCode,BounceReason,BatchID,TriggeredSendExternalKey,EventDateTimestamp,EventDate) VALUES ('7247942','536075','000060008489','[email protected]','53911595','318','Bounce','Soft bounce','450','Mailbox Full','386','None','2019-02-25 06:21:09','2019-02-25')" — Eric Bellet
– Eric Bellet, Commented Sep 25, 2019 at 16:20

Sandeep Purohit · Accepted Answer · 2016-10-23 19:17:44Z

2

Spark support hive syntax so if you want to insert row you can do as follows

insert into hello select t.* from (select 1, 'hello') t;

answered Oct 23, 2016 at 19:17

Sandeep Purohit

3,71222 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

TMichel Over a year ago

Thank you for your reply. Seems like too verbose for a simple insert statement, but definitely a way of doing so.

fattah.safa Over a year ago

What about the case when need to insert data to some columns, not all of them? for example: a table has three columns col0, col1 and col2 and I need to insert values int col0 and col2. How can I do that?

nessa.gp Over a year ago

I can't see how your solution is better than the solution already provided in the question (omitting the column names)

Artem Aliev Over a year ago

If spark datasource supports custom schema (implements SchemaRelationProvider) and allow to omit some of the columns. You can create a separate table mapping with only columns you want to update and use inserts on that table.

Collectives™ on Stack Overflow

Spark SQL: INSERT INTO statement syntax

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related