I wanted to compare memory consumption for same dataset. I read same SQL query with pandas and polars from an Oracle DB. Memory usage results are almost same. and execution time is 2 times faster than polars. I expect polars will be more memory efficient.
Is there anyone who can explain this? And any suggestion to reduce memory usage size for same dataset?
result(polars) and data(pandas) shapes:
and lastly memory usages:
One of the big advantages of Polars is query optimisation
If you're loading all data into memory with
read_database
, and only doing that, then there will be no differenceOn the other hand, if you make the dataframe you read in lazy (
DataFrame.lazy
), then perform some other operations, and then collect the results (LazyFrame.collect
), then that's where you'll see the Polars shineNote: usually you'll want to read the data in lazily directly (e.g.
scan_parquet
instead ofread_parquet
) but forread_database
there is noscan_
equivalent