Parse JSON data with Apache Spark and Scala

Question

Click here to see Image

I have this type of file with data where each line is a JSON object except first few words(see attached image). I want to parse this type of file using Spark and Scala. I have tried it using sqlContext.read.json(“path to json file”) but it gives me error(corrupt data) because whole data is not a JSON object. How do I parse this JSON file to SQL dataframe?

Well, the fact that you have non JSON data before the actual JSON, then yes, it's not valid in Sparks eyes. You need to extract that data separately — OneCricketeer
– OneCricketeer, Commented Mar 3, 2017 at 9:03
@AkhilChoudhari is these "first few words" have the same length in all rows? — Alex Karpov
– Alex Karpov, Commented Mar 3, 2017 at 9:15

Alex Karpov · Accepted Answer · 2017-03-03 09:33:34Z

1

Try this:

val rawRdd = sc.textFile("path-to-the-file")
val jsonRdd = rawRdd.map(_.substring(32)) //32 - number of first characters to ignore

val df = spark.read.json(jsonRdd)

answered Mar 3, 2017 at 9:33

Alex Karpov

5644 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Akhil Choudhari Over a year ago

Last command gave me an error shown below. at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)

Alex Karpov Over a year ago

it would be easier if you could provide some example data to test.

Alex Karpov Over a year ago

what version of spark do you use?

Akhil Choudhari Over a year ago

When I provide whole 20MB file to spark.jso.read(), its not working. but its working for half of the file. why?

Akhil Choudhari Over a year ago

It gives me error that : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

Collectives™ on Stack Overflow

Parse JSON data with Apache Spark and Scala

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related