I am passing in the following as a query (.dbtable) to pyspark, running in jupyter notebook on AWS EMR.
num = [1234,5678]
newquery = "(SELECT * FROM db.table WHERE col = 1234) as new_table"
newquery = "(SELECT * FROM db.table WHERE col = {num}) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN %(num)s) as new_table"
newquery = "(SELECT * FROM db.table WHERE col IN :(num)) as new_table"
The first "newquery" will return results. The rest fail.
What is the correct way to return this?
You can try using f-strings in PySpark
Also note, this function
str(num)[1:-1]is safe on string inputs too, if your list is having strings like['1234', '5678']it should create aINclause that factors this in as well.Also I hope you are using
new_tableas a part of a subquery.