Spark DataFrame - Read pipe delimited file using SQL?

Question

Based on Spark - load CSV file as DataFrame?

Is it possible to specify options using SQL to set the delimiter, null character, and quote?

val df = spark.sql("SELECT * FROM csv.`csv/file/path/in/hdfs`")

I know it can be done using spark.read.format("csv").option("delimiter", "|"), but ideally I wouldn't have to.

Updated Information

It seems that I have to pass the path using back-ticks.

When I attempting to pass OPTIONS

== SQL ==
SELECT * FROM 
csv.`csv/file/path/in/hdfs` OPTIONS (delimiter , "|" )
-----------------------------------^^^

Error in query:
mismatched input '(' expecting {<EOF>, ',', 'WHERE', 'GROUP', 'ORDER', 
'HAVING', 'LIMIT', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 
'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 
'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}

How about SELECT * FROM csv.csv/file/path/in/hdfs OPTIONS (delimiter , "|" ) — philantrovert
– philantrovert, Commented Dec 2, 2017 at 5:32

vatsal mevada · Accepted Answer · 2017-12-03 18:37:22Z

2

Althoguh not a one line souliton, following might work for you:

spark.sql("CREATE TABLE some_table USING com.databricks.spark.csv OPTIONS (path \"csv/file/path/in/hdfs\", delimeter \"|\")");
val df = spark.sql("SELECT * FROM some_table");

Of course you can skip the second step of loading into dataframe if you want to perform some SQL operation directly on some_table.

answered Dec 3, 2017 at 18:37

vatsal mevada

5,7368 gold badges47 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Spark DataFrame - Read pipe delimited file using SQL?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related