Variable has feature 1, expected 127 in Machine Learning Model

55 views Asked by At

I have created a ML model to test the password strength. The data set has a shape (669639, 127) and if I pass a password/variable to test the model, it gets a ValueError that password has 1 feature, expected 127. So, I tried to reshape the password/variable to (1,127) but it says cant reshape array of size 1 to (1,127). Any help would be highly appreciated, thank you. `#!/usr/bin/env python

coding: utf-8

In[2]:

import seaborn as sns
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In[3]:

data = pd.read_csv('D:\Datasets\strength.csv',',', error_bad_lines=False)

In[4]:

data

In[5]:

data.isna().sum()

In[6]:

data.dropna(inplace = True)

In[7]:

data.isna().sum()

In[8]:

sns.countplot(data['strength'])

In[9]:

data['strength'].value_counts()

In[10]:

password_tuple = np.array(data)

In[11]:

import random
random.shuffle(password_tuple)

In[12]:

X = [labels[0] for labels in password_tuple]
y = [labels[1] for labels in password_tuple]

In[13]:

def char_tokenizer(input):
    characters = []
    for i in input:
        characters.append(i)
    return characters

In[14]:

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(tokenizer = char_tokenizer)
X = tfidf.fit_transform(X)

In[15]:

X.shape

In[16]:

from sklearn.linear_model import LogisticRegression
logReg = LogisticRegression(penalty='l2',multi_class='ovr')
logReg.fit(X,y)

In[17]:

print(logReg.score(X,y))

In[18]:

import joblib
joblib.dump(logReg,'LogisticRegression_model.joblib')

In[17]:

from sklearn.naive_bayes import BernoulliNB
bnb = BernoulliNB()
bnb.fit(X,y)

In[18]:

print(bnb.score(X,y))

In[19]:

import joblib
joblib.dump(bnb,'NaiveBayes_model.joblib')

In[20]:

from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
dtc.fit(X,y)

In[21]:

print(dtc.score(X,y))

In[22]:

import joblib
joblib.dump(logReg,'DecisionTree_model.joblib')

In[23]:

from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100, max_depth=50, criterion='entropy')
rfc.fit(X,y)

In[24]:

print(rfc.score(X,y))

In[25]:

import joblib
joblib.dump(rfc,'RandomForest_model.joblib')

In[35]:

array =['dbswjwiqmd']
ar2 = np.array([array])
ar2.reshape(1,127)
logReg.predict(ar2)

`

1

There are 1 answers

2
boozy On

I assume that your training dataset has 127 columns which are extracted as features. To test the password, you need the apply feature extraction methods too. From 1 string/password you need to generate a vector with 127 columns.