how to dynamically build select list from a API payload using PyPika

470 views Asked by At

I have a JSON API payload containing tablename, columnlist - how to build a SELECT query from it using pypika?

So far I have been able to use a string columnlist, but not able to do advanced querying using functions, analytics etc.

from pypika import Table, Query, functions as fn

def generate_sql (tablename, collist):
    table = Table(tablename)
    columns = [str(table)+'.'+each for each in collist]
    q = Query.from_(table).select(*columns)
    return q.get_sql(quote_char=None)

tablename = 'customers'
collist = ['id', 'fname', 'fn.Sum(revenue)']
print (generate_sql(tablename, collist)) #1


table = Table(tablename)
q = Query.from_(table).select(table.id, table.fname, fn.Sum(table.revenue))
print (q.get_sql(quote_char=None)) #2

#1 outputs

SELECT "customers".id,"customers".fname,"customers".fn.Sum(revenue) FROM customers

#2 outputs correctly

SELECT id,fname,SUM(revenue) FROM customers
1

There are 1 answers

0
Rodrigo Rodrigues On

You should not be trying to assemble the query in a string by yourself, that defeats the whole purpose of pypika.

What you can do in your case, that you have the name of the table and the columns coming as texts in a json object, you can use * to unpack those values from the collist and use the syntax obj[key] to get the table attribute with by name with a string.

q = Query.from_(table).select(*(table[col] for col in collist))
# SELECT id,fname,fn.Sum(revenue) FROM customers

Hmm... that doesn't quite work for the fn.Sum(revenue). The goal is to get SUM(revenue).

This can get much more complicated from this point. If you are only sending column names that you know to belong to that table, the above solution is enough.

But if you have complex sql expressions, making reference to sql functions or even different tables, I suggest you to rethink your decision of sending that as json. You might end up with something as complex as pypika itself, like a custom parser or wathever. than your better option here would be to change the format of your json response object.

If you know you only need to support a very limited set of capabilities, it could be feasible. For example, you can assume the following constraints:

  • all column names refer to only one table, no joins or alias
  • all functions will be prefixed by fn.
  • no fancy stuff like window functions, distinct, count(*)...

Then you can do something like:

from pypika import Table, Query, functions as fn
import re

tablename = 'customers'
collist = ['id', 'fname', 'fn.Sum(revenue / 2)', 'revenue % fn.Count(id)']

def parsed(cols):
  pattern = r'(?:\bfn\.[a-zA-Z]\w*)|([a-zA-Z]\w*)'
  subst = lambda m: f"{'' if m.group().startswith('fn.') else 'table.'}{m.group()}"
  yield from (re.sub(pattern, subst, col) for col in cols)

table = Table(tablename)
env = dict(table=table, fn=fn)
q = Query.from_(table).select(*(eval(col, env) for col in parsed(collist)))
print (q.get_sql(quote_char=None)) #2

Output:

SELECT id,fname,SUM(revenue/2),MOD(revenue,COUNT(id)) FROM customers