Text File of specific format into DataFrame in Spark using Scala

Question

I am trying to convert the conversation into dataframe in spark through Scala. The person and its message are separated by tab length of space. Each conversation is in a new line.

The text file is like following:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U

and I need the dataframe as following:

------------------------------------
|Person  |  Message
------------------------------------
|1       |  hello,beta! how are you?
|2       |  I am fine alpha.How about you?
|1       |  I am also doing fine...
|1       |  Actually, beta, I am bit busy nowadays and sorry I hadn't call 
-------------------------------------

I am actually beginner in scala and I just move forward very little bit in it. I am learning complex map function right now like this problem have. ``` val text=sc.textFile("hdfs://localhost:9000/Conversation").map(x=>x.split("\n") val text2=text.foreach(x=>x.map(y=>y.split(" "))) ``` — Akhil
– Akhil, Commented Jul 8, 2019 at 16:22

Jonathan Myers · Accepted Answer · 2019-07-08 16:30:33Z

1

First I created a text file with your provided data, and put it in an HDFS location under temp/data.txt

data.txt:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U

I then created a case class, read in the file, and processed it into a data frame:

case class PersonMessage(Person: String, Message: String)
  val df = sc.textFile("temp/data.txt").map(x => {
  val splits = x.split("\t")
  PersonMessage(splits(0), splits(1))
}).toDF("Person", "Message")
df.show

+------+--------------------+
|Person|             Message|
+------+--------------------+
| alpha|hello,beta! how a...|
|  beta|I am fine alpha.H...|
| alpha|I am also doing f...|
| alpha|Actually, beta, I...|
+------+--------------------+

answered Jul 8, 2019 at 16:30

Jonathan Myers

9306 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

vkt · Accepted Answer · 2019-07-08 16:24:56Z

0

if you read the text file and parse that:

Example:

   val result: Dataset[(String, String)] = sparkSession.read.textFile("filePath").flatMap {
      line =>
        val str = line.split("\t")
        if (str.length == 2) {
          Some((str(0), str(1)))
        }
        else {
          //in case if you want to ignore malformed line
          None
        }
    }

answered Jul 8, 2019 at 16:24

vkt

1,4592 gold badges22 silver badges52 bronze badges

Collectives™ on Stack Overflow

Text File of specific format into DataFrame in Spark using Scala

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related