Not able to do grid search and train the model

Question

Not able to do grid search and train the model

22 views Asked by Maharshi Vashistha At 08 March 2024 at 12:17

I am working on a basic text classification problem, I want to use a stacking classifier along with some fine-tuning of the parameters of my base classifiers to get high-accuracy results.

My dataset has 8000 rows and 2 cols (text and class). The below piece of code seems to be stuck and I am not well versed in the field (beginner) to spot the problem.

import pandas as pd
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import NuSVC
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score, log_loss, classification_report, confusion_matrix

# Define parameter grids for classifiers
param_grid_nusvc = {
    'nu': [0.1, 0.3, 0.5, 0.7, 0.9],
    'kernel': ['linear', 'rbf'],
}

param_grid_logreg = {
    'C': [0.1, 1, 10],
    'penalty': ['l1', 'l2'],
}

# Perform grid search for classifiers with improved clarity
nusvc_grid_search = GridSearchCV(NuSVC(probability=True), param_grid_nusvc, cv=2, scoring='accuracy')  # Use accuracy scoring
logreg_grid_search = GridSearchCV(LogisticRegression(), param_grid_logreg, cv=2, scoring='accuracy')

nusvc_grid_search.fit(X_train, y_train)
logreg_grid_search.fit(X_train, y_train)

# Get best parameters
best_params_nusvc = nusvc_grid_search.best_params_
best_params_logreg = logreg_grid_search.best_params_

# Set up base classifiers with best parameters
best_nusvc = NuSVC(probability=True, **best_params_nusvc)
best_logreg = LogisticRegression(**best_params_logreg)

# Setting up stacking classifier
sc = StackingClassifier(
    estimators=[
        ('NuSVC', best_nusvc),
        ('LDA', LinearDiscriminantAnalysis())
    ],
    final_estimator=best_logreg
)

sc.fit(X_train, y_train)

# Evaluate the combined classifiers
print('****Results****')
train_predictions = sc.predict(X_test)
acc = accuracy_score(y_test, train_predictions)
print("Accuracy: {:.4%}".format(acc))

train_predictions_proba = sc.predict_proba(X_test)
ll = log_loss(y_test, train_predictions_proba)
print("Log Loss: {}".format(ll))

# Print classification report (optional)
print('\nClassification Report:')
print(classification_report(y_test, train_predictions))

# Print confusion matrix (optional)
print('\nConfusion Matrix:')
print(confusion_matrix(y_test, train_predictions))

some changes in the above have been made from the advice from chatGPT to guide me on how to fine tune using grid search. The code seems to be stuck (about 20 mins). Without the grid search it seemed to run in around 2-3 mins easily.

Original Q&A

There are 1 answers

**Ben Reiniger** · Answer 1 · 2024-03-08T14:00:32+00:00

Ben Reiniger On 08 March 2024 at 14:00

Your SVC grid has 5×2 points, each fitted for 2 folds, so that should take about 20× as long. You can set verbose=4 in the searches to better track what's happening, and consider parallelizing (n_jobs=-1 for example).

TechQA.

Not able to do grid search and train the model

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in NLP

Related Questions in TEXT-CLASSIFICATION

Popular Questions

Trending Questions