2

I am trying to convert the conversation into dataframe in spark through Scala. The person and its message are separated by tab length of space. Each conversation is in a new line.

The text file is like following:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U

and I need the dataframe as following:

------------------------------------
|Person  |  Message
------------------------------------
|1       |  hello,beta! how are you?
|2       |  I am fine alpha.How about you?
|1       |  I am also doing fine...
|1       |  Actually, beta, I am bit busy nowadays and sorry I hadn't call 
-------------------------------------
2
  • 1
    Can you share your code? Commented Jul 8, 2019 at 15:43
  • I am actually beginner in scala and I just move forward very little bit in it. I am learning complex map function right now like this problem have. ``` val text=sc.textFile("hdfs://localhost:9000/Conversation").map(x=>x.split("\n") val text2=text.foreach(x=>x.map(y=>y.split(" "))) ``` Commented Jul 8, 2019 at 16:22

2 Answers 2

1

First I created a text file with your provided data, and put it in an HDFS location under temp/data.txt

data.txt:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U

I then created a case class, read in the file, and processed it into a data frame:

case class PersonMessage(Person: String, Message: String)
  val df = sc.textFile("temp/data.txt").map(x => {
  val splits = x.split("\t")
  PersonMessage(splits(0), splits(1))
}).toDF("Person", "Message")
df.show
+------+--------------------+
|Person|             Message|
+------+--------------------+
| alpha|hello,beta! how a...|
|  beta|I am fine alpha.H...|
| alpha|I am also doing f...|
| alpha|Actually, beta, I...|
+------+--------------------+
Sign up to request clarification or add additional context in comments.

Comments

0

if you read the text file and parse that:

Example:

   val result: Dataset[(String, String)] = sparkSession.read.textFile("filePath").flatMap {
      line =>
        val str = line.split("\t")
        if (str.length == 2) {
          Some((str(0), str(1)))
        }
        else {
          //in case if you want to ignore malformed line
          None
        }
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.