5

I am new to Scala. I have a Dataframe with fields

ID:string, Time:timestamp, Items:array(struct(name:string,ranking:long))

I want to convert each row of the Items field to a hashmap, with the name as the key. I am not very sure how to do this.

2 Answers 2

9

This can be done using a UDF:

import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row

// Sample data:
val df = Seq(
  ("id1", "t1", Array(("n1", 4L), ("n2", 5L))),
  ("id2", "t2", Array(("n3", 6L), ("n4", 7L)))
).toDF("ID", "Time", "Items")

// Create UDF converting array of (String, Long) structs to Map[String, Long]
val arrayToMap = udf[Map[String, Long], Seq[Row]] {
  array => array.map { case Row(key: String, value: Long) => (key, value) }.toMap
}

// apply UDF
val result = df.withColumn("Items", arrayToMap($"Items"))

result.show(false)
// +---+----+---------------------+
// |ID |Time|Items                |
// +---+----+---------------------+
// |id1|t1  |Map(n1 -> 4, n2 -> 5)|
// |id2|t2  |Map(n3 -> 6, n4 -> 7)|
// +---+----+---------------------+

I can't see a way to do this without a UDF (using Spark's built-in functions only).

Sign up to request clarification or add additional context in comments.

Comments

1

Since 2.4.0, one can use map_from_entries:

import spark.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  (Array(("n1", 4L), ("n2", 5L))),
  (Array(("n3", 6L), ("n4", 7L)))
).toDF("Items")

df.select(map_from_entries($"Items")).show

/*
+-----------------------+
|map_from_entries(Items)|
+-----------------------+
|     [n1 -> 4, n2 -> 5]|
|     [n3 -> 6, n4 -> 7]|
+-----------------------+
*/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.