5

Based on Spark - load CSV file as DataFrame?

Is it possible to specify options using SQL to set the delimiter, null character, and quote?

val df = spark.sql("SELECT * FROM csv.`csv/file/path/in/hdfs`")

I know it can be done using spark.read.format("csv").option("delimiter", "|"), but ideally I wouldn't have to.

Updated Information

It seems that I have to pass the path using back-ticks.

When I attempting to pass OPTIONS

== SQL ==
SELECT * FROM 
csv.`csv/file/path/in/hdfs` OPTIONS (delimiter , "|" )
-----------------------------------^^^

Error in query:
mismatched input '(' expecting {<EOF>, ',', 'WHERE', 'GROUP', 'ORDER', 
'HAVING', 'LIMIT', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 
'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 
'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}
1
  • How about SELECT * FROM csv.csv/file/path/in/hdfs OPTIONS (delimiter , "|" ) Commented Dec 2, 2017 at 5:32

1 Answer 1

2

Althoguh not a one line souliton, following might work for you:

spark.sql("CREATE TABLE some_table USING com.databricks.spark.csv OPTIONS (path \"csv/file/path/in/hdfs\", delimeter \"|\")");
val df = spark.sql("SELECT * FROM some_table");

Of course you can skip the second step of loading into dataframe if you want to perform some SQL operation directly on some_table.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.