DuckDB pandas pass DataFrame by name easily with IntelliSense support

48 views Asked by At

In DuckDB, we have to directly use the DataFrame variable name as a string in the SQL syntax (as shown here):

import pandas as pd
import duckdb

mydf = pd.DataFrame({'a' : [1, 2, 3]})
print(duckdb.query("SELECT SUM(a) FROM mydf").to_df())

But in VS Code the IntelliSense will not recognise the reference to mydf in the second statement, and it will mark mydf as unreferenced. And if mydf is renamed with IntelliSense, of course the second statement will break. Is there any way to make IntelliSense work?

1

There are 1 answers

0
user1537366 On

It is possible to use the debug f-string syntax to extract the variable name. This works well with IntelliSense because it already recognises the f-string syntax.

import pandas as pd
import duckdb

def main():
    mydf = pd.DataFrame({'a' : [1, 2, 3]})
    print(duckdb.query("SELECT SUM(a) FROM " + df_name(f"{mydf.size=}")).to_df())

def df_name(dbg_str):
    """gets the DataFrame variable name from the debug str minus the ".size"
    attribute before the equals sign
    
    Example:

    ```
    >>> df = pd.DataFrame({"a": [1, 2, 3]})
    >>> df_name(f"{df.size=}")
    "df"
    ```
    """
    return dbg_str[:dbg_str.index(".size=")]

if __name__ == "__main__":
    main()