0

I am trying to create a Row[RDD]

val RowRDD =sc.textFile("InputFile.csv").map(x=>x.split(" ")).map(p=>Row(p(1),p(2)))

InputFile.csv is

spark 5 1
hadoop 7 1
flink 10 1

However,as I am running my application,the error says

java.lang.ArrayIndexOutOfBoundsException: 1

It is obvious that the 'InputFile.csv' has 3 rows,why is there an error?

2
  • Could there be an empty line that causes the error? However, keep in mind that in Spark you have facilities in place that allow you to read CSVs with ease (with the possibility of specifying a custom column delimiter, as it seems relevant in your case). More here: spark.apache.org/docs/2.1.1/… And options for the CSV reader are here: spark.apache.org/docs/2.1.1/api/scala/…*):org.apache.spark.sql.DataFrame Commented Jul 25, 2017 at 6:26
  • @stefanobaghino Yes,an empty line may be the problem.I have tried to take care of empty lines and it succeeds. Commented Jul 25, 2017 at 6:41

1 Answer 1

2

I have attached a screenshot of my own attempt at reading your file in spark-shell; as you can very well see, there should be no problem running your particular line on that code. It very much possible that you left out some other lines in your entire code. One mistake I often make is I give a reference to a command line argument and then forget to pass any arguments on the command line. It can probably be diagnosed if you can paste the entire code. The one line given above is entirely correct. spark-shell screenshot

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.