2

I'm experimenting with Spark and Spark SQL and I need to concatenate a value at the beginning of a string field that I retrieve as output from a select (with a join) like the following:

val result = sim.as('s)   
    .join(
        event.as('e),
        Inner,
        Option("s.codeA".attr === "e.codeA".attr))   
    .select("1"+"s.codeA".attr, "e.name".attr)  

Let's say my tables contain:

sim:

codeA,codeB
0001,abcd
0002,efgh

events:

codeA,name
0001,freddie
0002,mercury

And I would want as output:

10001,freddie
10002,mercury

In SQL or HiveQL I know I have the concat function available, but it seems Spark SQL doesn't support this feature. Can somebody suggest me a workaround for my issue?

Thank you.

Note: I'm using Language Integrated Queries but I could use just a "standard" Spark SQL query, in case of eventual solution.

2
  • Possible duplicate of Concatenate columns in apache spark dataframe Commented May 29, 2016 at 19:51
  • I think you don't need to do much of work like using map and creating function.just use this "val result = sim.as("s").join(events, sim("codeA")===events("codeA"),"inner") .select("s.codeA","name").withColumn("codeA",concat(lit("1"),lit(""),col("codeA"))).show " Commented Oct 7, 2019 at 5:33

2 Answers 2

0

The output you add in the end does not seem to be part of your selection, or your SQL logic, if I understand correctly. Why don't you proceed by formatting the output stream as a further step ?

val results = sqlContext.sql("SELECT s.codeA, e.code FROM foobar")
results.map(t => "1" + t(0), t(1)).collect()
Sign up to request clarification or add additional context in comments.

1 Comment

Hi @huitseeker, thanks for your answer. Your solution might be actually an option, since I have to format the output as tab separated values, indeed. However, the issue is still valide, since I also need to perform another join with the "concatenated-value", i.e. something like: Option("s.codeA".attr === 1+"r.codeA".attr) where r has codeA values like 10001, 10002, ...
0

It's relatively easy to implement new Expression types directly in your project. Here's what I'm using:

case class Concat(children: Expression*) extends Expression {
  override type EvaluatedType = String

  override def foldable: Boolean = children.forall(_.foldable)
  def nullable: Boolean = children.exists(_.nullable)
  def dataType: DataType = StringType

  def eval(input: Row = null): EvaluatedType = {
    children.map(_.eval(input)).mkString
  }
}

val result = sim.as('s)
    .join(
        event.as('e),
        Inner,
        Option("s.codeA".attr === "e.codeA".attr))
    .select(Concat("1", "s.codeA".attr), "e.name".attr)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.