PandasQueryEngine from llama-index is unable to execute code with the following error: invalid syntax (, line 0)

712 views Asked by At

I have the following code. I am trying to use the local llama2-chat-13B model. The instructions appear to be good but the final output is erroring out.

import logging
import sys
from IPython.display import Markdown, display

import pandas as pd
from llama_index.query_engine import PandasQueryEngine

df = pd.read_csv('./data/test.csv')
df.head()

service_context = ServiceContext.from_defaults(llm="local", embed_model="local")

query_engine = PandasQueryEngine(df=df, verbose=True, service_context=service_context)

response = query_engine.query("What is the size of the dataframe")
display(Markdown(f"<b>{response}</b>"))

Here is the output:

> Pandas Instructions:

Sure, I'd be happy to help! Based on the input query "What is the size of the dataframe?", we can create an executable Python code using Pandas as follows:

import pandas as pd

df_size = len(df)

This code will give us the size of the dataframe, which is the number of rows it contains. The len() function returns the length of a list or an array, and in this case, it returns the number of rows in the dataframe.

Note that we don't need to use quotes around the variable name df because it is already defined as a pandas DataFrame object. Also, the eval() function is not necessary here since we are only executing a simple Python expression.

> Pandas Output: There was an error running the output as Python code. Error message: unexpected indent (<unknown>, line 1)

Traceback (most recent call last):
  File "/opt/conda/envs/llm/lib/python3.11/site-packages/llama_index/query_engine/pandas_query_engine.py", line 60, in default_output_processor
    tree = ast.parse(output)
           ^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/llm/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 1
    Sure, I'd be happy to help! Based on the input query "What is the size of the dataframe?", we can create an executable Python code using Pandas as follows:
IndentationError: unexpected indent
llama_print_timings:        load time =    3552.20 ms
llama_print_timings:      sample time =      95.95 ms /   165 runs   (    0.58 ms per token,  1719.63 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   24428.80 ms /   165 runs   (  148.05 ms per token,     6.75 tokens per second)
llama_print_timings:       total time =   24965.81 ms
There was an error running the output as Python code. Error message: unexpected indent (, line 1)

Could this be because I am not using OpenAI? Any leads on resolving this are appreciated. Are there any alternatives to PandasQueryEngine which can be used with any model of my choice to analyze a dataframe using natural language?

I tried the above code and was expecting it to print the df_size as Pandas Output.

1

There are 1 answers

0
Nurullah gümüş On

I have a work around for this. Instead of Pandas Query engine, which throws a lot of error. You can use Pandas AI or you can import your csv data to sqllite db and use SQL Engine