How to use large table from exasol database for statistical analysis in Python?

Question

How to use large table from exasol database for statistical analysis in Python?

99 views Asked by dika At 23 October 2023 at 08:00

I have a table with 36 millions rows and I need to run different statistical analysis (e.g. hypothesis tests, distribution analysis etc.).

Since I get a memory error when I use the export_to_pandas method I need to read the data either chunks wise or which is my preferred option use a dask data frame.

However after several tries I did no managed to import the table from the exasol database into a dask dataframe.

How the code could look like?

Even the chunkwise code did not worked:

import pyexasol

# Connect to Exasol
connection = pyexasol.connect(dsn="localhost:8563", user="your_user", password="your_password", schema="your_schema")

# Define your SQL query
sql_query = "SELECT * FROM your_table"

# Set the chunk size for fetching data
chunk_size = 1000

# Execute the query and fetch data in chunks
with connection.cursor() as cursor:
    cursor.execute(sql_query)
    
    while True:
        # Fetch data in chunks
        data_chunk = cursor.fetchall(chunk_size=chunk_size)
        
        # Break the loop if no more data is available
        if not data_chunk:
            break
        
        # Process the data (replace this with your actual data processing logic)
        for row in data_chunk:
            print(row)

# Close the connection
connection.close()

Original Q&A

There are 1 answers

**Alexander Lipatiev** · Answer 1 · 2023-10-31T22:21:02+00:00

EXPORT TO LOCAL file command will hardly fail with out of memory. Therefore, I presume the error is thrown during conversion of a CSV file to a Pandas dataframe: https://github.com/exasol/pyexasol/blob/master/pyexasol/callback.py#L42

Therefore, one may try to tune it via Pandas-specific parameters (low_memory, chunksize etc.) passing them via **kwargs to export_to_pandas like

C.export_to_pandas('users', low_memory=False)

One more option - try parallel export and combine the dataframes.

TechQA.

How to use large table from exasol database for statistical analysis in Python?

There are 1 answers

Related Questions in DATABASE

Related Questions in STATISTICS

Related Questions in DASK

Related Questions in CHUNKS

Related Questions in EXASOL

Popular Questions

Popular Tags

Trending Questions