0

I am passing in the following as a query (.dbtable) to pyspark, running in jupyter notebook on AWS EMR.

num = [1234,5678]

newquery = "(SELECT * FROM db.table WHERE col = 1234) as new_table"
newquery = "(SELECT * FROM db.table WHERE col = {num}) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN %(num)s) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN :(num)) as new_table"

The first "newquery" will return results. The rest fail.

What is the correct way to return this?

1 Answer 1

0

You can try using f-strings in PySpark

num = [1234,5678]

filter_part = str(num)[1:-1]

newquery = f"(SELECT * FROM db.table WHERE col IN ({num_str})) AS new_table"

# Run the query
spark.sql(newquery)

Also note, this function str(num)[1:-1] is safe on string inputs too, if your list is having strings like ['1234', '5678'] it should create a IN clause that factors this in as well.

Also I hope you are using new_table as a part of a subquery.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.