I am using Spark2.3 with Scala and trying to load multiple csv files from a directory, I am getting an issue that it load files but miss some columns from them
I have following sample files
test1.csv
Col1,Col2,Col3,Col4,Col5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
aaa,2,3,4,5
test2.csv
Col1,Col2,Col3,Col4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
aaa,2,3,4
test3.csv
Col1,Col2,Col3,Col4,Col6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
aaa,2,3,4,6
test4.csv
Col1,Col2,Col5,Col4,Col3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
aaa,2,5,4,3
What i want to do is load all these files into a dataframe with all the columns in 4 files but when i try to load files with following code
val dft = spark.read.format("csv").option("header", "true").load("path/to/directory/*.csv")
It loads csv but misses some columns from csv.
here is the output of dft.show()
+----+----+----+----+----+
|Col1|Col2|Col3|Col4|Col6|
+----+----+----+----+----+
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 3| 4| 6|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 5| 4| 3|
| aaa| 2| 3| 4| 5|
| aaa| 2| 3| 4| 5|
+----+----+----+----+----+
I want it to be like this
+----+----+----+----+----+----+
|Col1|Col2|Col3|Col4|Col5|Col6|
+----+----+----+----+----+----+
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
| aaa| 2| 3| 4| 5| 6|
+----+----+----+----+----+----+
Please guide me what is wrong with my code? or is there any other efficient way to do it?
Thanks