Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException

Question

I have the following dataframe:

dataframe1
+-----------------------+
|ID                     |
+-----------------------+
|[10,80,60,]            |
|[20,40,]               |
+-----------------------+

And another dataframe:

dataframe2
+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
|200               | vbb            |
+------------------+----------------+

I want the following output:

+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
+------------------+----------------+

I'm using the following code to select from the second dataframe rows witch ID_2 == ID.

for (java.util.Iterator<Row> iter = dataframe1.toLocalIterator(); iter.hasNext();) {
        String item = (iter.next()).get(0).toString();
        dataframe2.registerTempTable("data2");
        Dataset<Row> res = sparkSession.sql("select * from data2 where ID_2 IN ("+item+")");
        res.show();
}

But I get the following exception :

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input 'from' expecting <EOF>(line 1, pos 9)

 == SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
 ---------^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

How can I fix this?

Instead of ...get(0).toString() try ...get(0).mkString(","). — mazaneicha
– mazaneicha, Commented Aug 15, 2020 at 12:40

Daeho Ro · Accepted Answer · 2020-08-15 14:03:23Z

1

Simply use the explode function.

df1.withColumn("ID", explode($"ID"))
  .join(df2, $"ID" === $"ID_2", "inner")
  .drop("ID")
  .show

+----+----+
|ID_2|name|
+----+----+
|  40|xyzz|
+----+----+

answered Aug 15, 2020 at 14:03

Daeho Ro

13.7k4 gold badges25 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related