How much memory need for XGBoost model?

2.3k views Asked by At

Background: Training set with 100m rows and about 50 columns, and i have cast the dtype to the minimum types. still, the dataframe is like 8-10Gb when loaded.

Run training on AWS ec2 instances(one is 36CPU + 72RAM. another is 16CPU + 128RAM)

Problems: 1; Load data in Pandas dataframe and try with default config with xgboost, and memory soon exploded 2; Also, i tried with Dask dataframe with distributed client enabled and using dask.xgboost, it run a bit longer, but i have worker failed warnings and progress stalled.

So, is there a way for me to estimate how big RAM i should use to make sure it is enough?

here is some codes:

import dask_ml.xgboost as dxgb
import dask.dataframe as ddf

train = pd.read_parquet('train_latest',engine='pyarrow')
train = ddf.from_pandas(train, npartitions=72)
X ,y = train[feats],train[label]
X_train,y_train,X_test,y_test = make_train_test(X,y) # customized function to divide train/test

model = dxgb.XGBClassifier(n_estimators=1000, 
                          verbosity=1, 
                          n_jobs=-1, 
                          max_depth=10, 
                          learning_rate=0.1)
model.fit(X_train,y_train)
0

There are 0 answers