2

I evaluate Spark 4 try_variant_get method handling variant type data. First I make sql statements examples.

CREATE TABLE family (
  id INT,
  data VARIANT
);


INSERT INTO family (id, data)
VALUES
(1, PARSE_JSON('{"name":"Alice","age":30}')),
(2, PARSE_JSON('[1,2,3,4,5]')),
(3, PARSE_JSON('42'));

When SQL is executed, no errors are brought. Then Below codes are the select command using try_variant_get method

SELECT
  id,
  try_variant_get(data, '$.name', 'STRING') AS name,
  try_variant_get(data, '$.age', 'INT') AS age
FROM
  family
WHERE 
  try_variant_get(data, '$.name', 'STRING') IS NOT NULL;

SQL output is successfully returned. Then I transform these SQL statements into java api codes.

SparkSession spark = SparkSession.builder().master("local[*]").appName("VariantExample").getOrCreate();

StructType schema = new StructType()
       .add("id", DataTypes.IntegerType)
       .add("data", DataTypes.VariantType);

Dataset<Row> df = spark.createDataFrame(
       Arrays.asList(
            RowFactory.create(1, "{\"name\":\"Alice\",\"age\":30}"),
            RowFactory.create(2, "[1,2,3,4,5]"),
            RowFactory.create(3, "42")
       ),
       schema
);

 Dataset<Row> df_sel = df.select(
            col("id"),
            try_variant_get(col("data"), "$.name", "String").alias("name"),
            try_variant_get(col("data"), "$.age", "Integer").alias("age")
        ).where("name IS NOT NULL");

df_sel.printSchema();
df_sel.show();

But these java codes throw the following exceptions.

root
 |-- id: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)

Exception in thread "main" java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.spark.unsafe.types.VariantVal (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.spark.unsafe.types.VariantVal is in unnamed module of loader 'app')
        at org.apache.spark.sql.catalyst.expressions.variant.VariantGet.nullSafeEval(variantExpressions.scala:282)
        at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:692)
        at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:159)
        at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:89)
        at org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation$$anonfun$apply$48.$anonfun$applyOrElse$83(Optimizer.scala:2231)
        at scala.collection.immutable.List.map(List.scala:247)
        at scala.collection.immutable.List.map(List.scala:79).....

The "String" parameter of try_variant_get method has some problems. But I have no idea what is wrong with these java codes. Kindly inform me how to fix these errors.

4
  • When you run createDataFrame() part, have you verified the data is inserted as expected, like when you inserted manually? Does select still work? If so, then we can isolate the problem only to the df.select() part Commented Dec 28, 2024 at 4:48
  • Thanks for your reply. createDataFrame() part works without error. df.show() codes bring successful results. I agree the problem is related with df.select() part. Especially try_variant_get() method. Commented Dec 28, 2024 at 5:07
  • I have the same issue but with java.lang.Integer. I use RowFactory.create() with a single parameter passing various Java objects (List for an array, Map for an object, String, Integer, Double, etc) and create a DataFrame whose schema is a single Variant column. It is successful and the resulting DataFrame can nicely be displayed with show(), but attempting to do more advanced processing leads to a similar error. What is the proper way to create variant values in Java (not just json types, also dates, binary, durations, all the atomic types documented in Spark SQL)? Commented Aug 25 at 10:03
  • According to the stack, it happens here: github.com/apache/spark/blob/… Commented Aug 25 at 10:18

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.