Simple Table Operation Has Very Large Compilation Time with MLJ

Question

Simple Table Operation Has Very Large Compilation Time with MLJ

72 views Asked by Jack N At 09 June 2022 at 14:21

I am trying to use MLJ on a DataFrame (30,000 rows x 8,000 columns) but every table operation seems to take a huge amount of time to compile but is fast to run.

I have given an example with code below in which a 5 x 5000 DataFrame is generated and it gets stuck on the unpack line (line 3). When I run the same code for a 5 x 5 DataFrame, line 3 outputs “2.872309 seconds (9.09 M allocations: 565.673 MiB, 6.47% gc time, 99.84% compilation time)”.

This is a crazy amount of compilation time for a seemingly simple task and I would like to know how I can reduce this. Thank you, Jack

using MLJ

using DataFrames

[line 1] @time arr = [[rand(1:10) for i in 1:5] for i in 1:5000];

output: 0.053668 seconds (200.76 k allocations: 11.360 MiB, 22.16% gc time, 99.16% compilation time)

[line 2] @time df = DataFrames.DataFrame(arr, :auto)

output: 0.267325 seconds (733.43 k allocations: 40.071 MiB, 4.29% gc time, 98.67% compilation time)

[line 3] @time y, X = unpack(df, ==(:x1));

does not finish running

Original Q&A

There are 2 answers

**Nils Gudat** · Answer 1 · 2022-06-09T14:53:13+00:00

It's not unexpected that the Julia compiler struggles with very wide DataFrames, which have (potentially) heterogeneous column types. That said I'm not sure why this has to be a problem for this operation - I've checked with MLJ maintainers who can hopefully chime in.

In the meantime you can simply do

y, X = df.x1, select!(df, Not(:x1))

which is instantaneous (Note select! will drop x1 from your underlying data, if you want to copy data use select instead)

**RikH** · Answer 2 · 2022-06-20T10:10:22+00:00

Please don't cross-post a problem on multiple websites without linking.

The question has been answered at the Julia forum: https://discourse.julialang.org/t/simple-table-operation-has-very-large-compilation-time-with-mlj/82503/2. It was caused by a bug which is fixed in MLJBase 0.20.5.

TechQA.

Simple Table Operation Has Very Large Compilation Time with MLJ

There are 2 answers

Related Questions in JULIA

Related Questions in JULIA-DATAFRAME

Popular Questions

Trending Questions