String Interpolation Issue Jupyter Notebook Pyspark

Question

I am passing in the following as a query (.dbtable) to pyspark, running in jupyter notebook on AWS EMR.

num = [1234,5678]

newquery = "(SELECT * FROM db.table WHERE col = 1234) as new_table"
newquery = "(SELECT * FROM db.table WHERE col = {num}) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN %(num)s) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN :(num)) as new_table"

The first "newquery" will return results. The rest fail.

What is the correct way to return this?

Lingesh.K · Accepted Answer · 2024-02-24 00:51:56Z

0

You can try using f-strings in PySpark

num = [1234,5678]

filter_part = str(num)[1:-1]

newquery = f"(SELECT * FROM db.table WHERE col IN ({num_str})) AS new_table"

# Run the query
spark.sql(newquery)

Also note, this function str(num)[1:-1] is safe on string inputs too, if your list is having strings like ['1234', '5678'] it should create a IN clause that factors this in as well.

Also I hope you are using new_table as a part of a subquery.

answered Feb 24, 2024 at 0:51

Lingesh.K

5104 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

String Interpolation Issue Jupyter Notebook Pyspark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related