9

While reading the Datastax docs for supported syntax of Spark SQL, I noticed you can use INSERT statements like you would normally do:

INSERT INTO hello (someId,name) VALUES (1,"hello")

Testing this out in a Spark 2.0 (Python) environment and a connection to a Mysql database, throws the error:

File "/home/yawn/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException: 
u'\nmismatched input \'someId\' expecting {\'(\', \'SELECT\', \'FROM\', \'VALUES\', \'TABLE\', \'INSERT\', \'MAP\', \'REDUCE\'}(line 1, pos 19)\n\n== SQL ==\nINSERT INTO hello (someId,name) VALUES (1,"hello")\n-------------------^^^\n'

However if I remove the explicit column definition, it works as expected:

INSERT INTO hello VALUES (1,"hello")

Am I missing something?

6
  • 4
    As I know, spark sql is based on Hive SQL syntax and Language Manual DML for hive says "Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to." so probably it does not make sense to provide columns from spark sql point of view. Commented Oct 23, 2016 at 18:45
  • 1
    @VladoDemcak well, it make sense to me from the readibility point of view, whether or not is necessary to provide a value for every column. Anyway, does this mean that the Datastax docs misplaced that particular information? Commented Oct 25, 2016 at 12:30
  • 1
    Probably Datastax docs misplaced - databricks documentation says only this is possible Commented Oct 27, 2016 at 9:55
  • @VladoDemcak Thank you Commented Oct 27, 2016 at 10:12
  • I have the same problem, I wanna do "INSERT INTO travelTable (ClientID,SendID,SubscriberKey,EmailAddress,SubscriberID,ListID,EventType,BounceCategory,SMTPCode,BounceReason,BatchID,TriggeredSendExternalKey,EventDateTimestamp,EventDate) VALUES ('7247942','536075','000060008489','[email protected]','53911595','318','Bounce','Soft bounce','450','Mailbox Full','386','None','2019-02-25 06:21:09','2019-02-25')" Commented Sep 25, 2019 at 16:20

1 Answer 1

2

Spark support hive syntax so if you want to insert row you can do as follows

insert into hello select t.* from (select 1, 'hello') t;
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your reply. Seems like too verbose for a simple insert statement, but definitely a way of doing so.
What about the case when need to insert data to some columns, not all of them? for example: a table has three columns col0, col1 and col2 and I need to insert values int col0 and col2. How can I do that?
I can't see how your solution is better than the solution already provided in the question (omitting the column names)
If spark datasource supports custom schema (implements SchemaRelationProvider) and allow to omit some of the columns. You can create a separate table mapping with only columns you want to update and use inserts on that table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.