4

I can't believe I am asking this but...

HOW DO YOU ESCAPE A SQL QUERY STRING IN SPARK SQL USING SCALA?

I have tired everything and searched everywhere. I thought the apache commons library would do it but no luck:

import org.apache.commons.lang.StringEscapeUtils

var sql = StringEscapeUtils.escapeSql("'Ulmus_minor_'Toledo'");

df.filter("topic = '" + sql + "'").map(_.getValuesMap[Any](List("hits","date"))).collect().foreach(println);

returns the following:

topic = '''Ulmus_minor_''Toledo''' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.parseExpression(SqlParser.scala:45) at org.apache.spark.sql.DataFrame.filter(DataFrame.scala:651) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:34) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) at $iwC$$iwC$$iwC$$iwC$$iwC.(:40) at $iwC$$iwC$$iwC$$iwC.(:42) at $iwC$$iwC$$iwC.(:44) at $iwC$$iwC.(:46) at $iwC.(:48) at (:50) at .(:54) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Help would be great.

j

2 Answers 2

7

It may be surprising but:

var sql = "'Ulmus_minor_'Toledo'"
df.filter(s"""topic = "$sql"""")

works just fine, although it would be much cleaner to use this:

df.filter($"topic" <=> sql)
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks dude. Pretty sure I have come across that syntax but not sure where.
It could be df.filter("topic = \"" + sql + "\"") as well.
does this escape double quotes also?
DSL syntax for sure. The first option I am not sure but I doubt it.
Triple quotes are the safest approach in general unless you want to match something like foo"""bar. In the case like you can usually solve the problem using string interpolation. val q ="""""""; s"""foo"$q"bar"""
|
5

The title of the question is about escaping strings in SparkSQL generally, so there may be a benefit to providing an answer that works for any string, regardless of how it is used in an expression.

def sqlEscape(s: String) = 
  org.apache.spark.sql.catalyst.expressions.Literal(s).sql

sqlEscape("'Ulmus_minor_'Toledo' and \"om\"")
res0: String = '\'Ulmus_minor_\'Toledo\' and "om"'

1 Comment

Is there an equivalent for PySpark? PySpark has a lit(string) function too, but I can't figure out how to get an SQL-escaped string out of it (or if that's even possible).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.