I am trying to rename tables and columns in a SQL query string using Python. I have a mapping of old names to new names stored in a Python dictionary. The SQL query can be complex, with multiple joins, nested structures, and table/column aliases. The column names in the query might not be prefixed with a table name or alias, especially in the case of nested subqueries.
Here is an example SQL query:
SELECT p.name, p.age, j.title
FROM person AS p
LEFT JOIN job AS j ON p.job_id = j.job_id
WHERE p.age > 30
And here is an example mapping:
mapping = {
"person": "individual",
"name": "full_name",
"age": "years_old",
"job": "occupation",
"title": "role",
"job_id": "identifier"
}
I want to transform the SQL query based on this mapping. I have tried using the sqlglot library in Python to parse the SQL query into an abstract syntax tree (AST), traverse the AST, and replace the names.
Here is the code I have tried so far:
from sqlglot import parse
from sqlglot.expressions import Identifier, Column, Table
def transform(node, mapping):
if isinstance(node, (Identifier, Column, Table)) and node.args['this'] in mapping:
return type(node)(this=mapping[node.args['this']], **{arg: val for arg, val in node.args.items() if arg != 'this'})
return node
expressions = parse(sql)
new_expressions = [expression.transform(lambda node: transform(node, mapping)) for expression in expressions]
new_sql = "".join(expression.sql() for expression in new_expressions)
print(new_sql)
This script works for simple cases. However, I encountered difficulties when
- the same column name is present in multiple tables, and each has a different mapping based on the table it belongs to. Example:
sql = '''SELECT p.name, p.age, j.title
FROM person AS p
LEFT JOIN job AS j ON p.job_id = j.job_id
WHERE p.age > 30'''
job_id is available in both,
- for person
- job_id --> occupation_id
- for job
- job_id --> identifier
- in nested sub-queries, when the same alias is used for different table.