Renaming the columns in Vaex

1k views Asked by At

I tried to read a csv file of 4GB initially with pandas pd.read_csv but my system is running out of memory (I guess) and the kernel is restarting or the system hangs. So, I tried using vaex library to convert csv to HDF5 and do operations(aggregations,group by)on that. For that I've used:

df = vaex.from_csv('Wager-Win_April-Jul.csv',column_names = None, convert=True, chunk_size=5000000)

and

df = vaex.from_csv('Wager-Win_April-Jul.csv',header = None, convert=True, chunk_size=5000000)

But still I'm getting my first record in csv file as the header(column names to be precise)and I'm unable to change the column names. I tried finding function to change the names but didn't come across any. Pls help me on that. Thanks :)

The column names 1559104, 10289, 991... is actually the first record in the csv and somehow vaex is taking the first row as my column names which I want to avoid enter image description here

1

There are 1 answers

0
Joco On

vaex.from_csv is a wrapper around pandas.read_csv with few extra options for the conversion.

So reading the pandas documentation, header='infer' (which is the default) if you want the csv reader to automatically infer the column names. Otherwise the 1st row of the file is used as the header. Alternatively you can pass the column names manually via the names kwarg. Same holds true for both vaex and pandas.

I would read the pandas.read_csv documentation to better understand all the options. Then you can use those options with vaex and the convert and chunk_size arguments.