redirect GridSearchCV (or any other Sklearn object) output to file

Question

redirect GridSearchCV (or any other Sklearn object) output to file

620 views Asked by nogmos At 30 December 2024 at 22:47

I want to be able to save GridSearchCV output to file while running.

GridSearchCV(XGBClassifier(), tuned_parameters, cv=cv, n_jobs=-1, verbose=10)

This is an example for an output:

    Fitting 1 folds for each of 200 candidates, totalling 200 fits
    [Parallel(n_jobs=-1)]: Using backend with 4 concurrent workers.
    [CV] colsample_bytree=0.7, learning_rate=0.05, max_depth=4, n_estimators=300, subsample=0.7  
    [CV] colsample_bytree=0.7, learning_rate=0.05, max_depth=4, n_estimators=300, subsample=0.7 
score=0.645, total= 6.3min
    [Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:  6.3min

I managed to save the first line and the Parallel lines, but no matter what I tried, I couldn't save the lines that start with [CV]. I want to save those lines so if the program will fail, I could at least see part of the results.

I tried the solutions from here

sys.stdout = open('file', 'w')

and:

with open('help.txt', 'w') as f:
    with redirect_stdout(f):
        print('it now prints to `help.text`')

This solution (that is also referring to this solution) also didn't work:

class Tee(object):
    def __init__(self, *files):
        self.files = files
    def write(self, obj):
        for f in self.files:
            f.write(obj)
            f.flush() # If you want the output to be visible immediately
    def flush(self) :
    for f in self.files:
        f.flush()

And tried this monkey-patch as the author called it, but is also just saved the "Parallel" lines.

(Just to emphasize, the codes above are just a glimpse of the proposed solutions, when I tried them, I took all relevant code).

Is there a way to save ALL output?

Original Q&A

There are 1 answers

**Kota Mori** · Answer 1 · 2020-10-22T13:24:16+00:00

I don't know if you can do this using sys library or others. Instead, I suggest the following approach where we redirect stdout and stderr properly.

Suppose you have a script like this:

test.py

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
params = {"C": [0.001, 0.01, 0.1, 1, 2, 3]}
grid = GridSearchCV(model, params, n_jobs=-1, verbose=10)
X = np.random.randn(100, 10)
y = np.random.randint(0, 2, 100)

grid.fit(X, y)

Then run it with:

python test.py > logfile.txt 2>&1

Then you will have both "Parallel" and "CV" lines in logfile.txt:

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  11 out of  30 | elapsed:    1.7s remaining:    2.9s
[Parallel(n_jobs=-1)]: Done  15 out of  30 | elapsed:    1.7s remaining:    1.7s
[Parallel(n_jobs=-1)]: Done  19 out of  30 | elapsed:    1.7s remaining:    1.0s
[Parallel(n_jobs=-1)]: Done  23 out of  30 | elapsed:    1.7s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  27 out of  30 | elapsed:    1.7s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:    1.7s finished
[CV] C=0.001 .........................................................
[CV] ............................. C=0.001, score=0.500, total=   0.0s
[CV] C=0.1 ...........................................................
[CV] ............................... C=0.1, score=0.450, total=   0.0s
[CV] C=0.1 ...........................................................
[CV] ............................... C=0.1, score=0.550, total=   0.0s
[CV] C=1 .............................................................
[CV] ................................. C=1, score=0.550, total=   0.0s
[CV] C=1 .............................................................
[CV] ................................. C=1, score=0.500, total=   0.0s
[CV] C=2 .............................................................
...

Details

The "[CV]" lines are produced by print statement (Source). This is written to stdout.

And "Parallel" lines are produced by loggers (Source). This is written to stderr.

> logfile.txt 2>&1 is a trick to redirect both stdout and stderr to a same file (Related question). As a result, both messages are written to a same file.

TechQA.

redirect GridSearchCV (or any other Sklearn object) output to file

There are 1 answers

Details

Related Questions in PYTHON

Related Questions in LOGGING

Related Questions in SCIKIT-LEARN

Related Questions in GRIDSEARCHCV

Related Questions in CONSOLE-OUTPUT

Popular Questions

Popular Tags

Trending Questions