I have created a ML model to test the password strength. The data set has a shape (669639, 127) and if I pass a password/variable to test the model, it gets a ValueError that password has 1 feature, expected 127. So, I tried to reshape the password/variable to (1,127) but it says cant reshape array of size 1 to (1,127). Any help would be highly appreciated, thank you. `#!/usr/bin/env python
coding: utf-8
In[2]:
import seaborn as sns
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
In[3]:
data = pd.read_csv('D:\Datasets\strength.csv',',', error_bad_lines=False)
In[4]:
data
In[5]:
data.isna().sum()
In[6]:
data.dropna(inplace = True)
In[7]:
data.isna().sum()
In[8]:
sns.countplot(data['strength'])
In[9]:
data['strength'].value_counts()
In[10]:
password_tuple = np.array(data)
In[11]:
import random
random.shuffle(password_tuple)
In[12]:
X = [labels[0] for labels in password_tuple]
y = [labels[1] for labels in password_tuple]
In[13]:
def char_tokenizer(input):
characters = []
for i in input:
characters.append(i)
return characters
In[14]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(tokenizer = char_tokenizer)
X = tfidf.fit_transform(X)
In[15]:
X.shape
In[16]:
from sklearn.linear_model import LogisticRegression
logReg = LogisticRegression(penalty='l2',multi_class='ovr')
logReg.fit(X,y)
In[17]:
print(logReg.score(X,y))
In[18]:
import joblib
joblib.dump(logReg,'LogisticRegression_model.joblib')
In[17]:
from sklearn.naive_bayes import BernoulliNB
bnb = BernoulliNB()
bnb.fit(X,y)
In[18]:
print(bnb.score(X,y))
In[19]:
import joblib
joblib.dump(bnb,'NaiveBayes_model.joblib')
In[20]:
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
dtc.fit(X,y)
In[21]:
print(dtc.score(X,y))
In[22]:
import joblib
joblib.dump(logReg,'DecisionTree_model.joblib')
In[23]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100, max_depth=50, criterion='entropy')
rfc.fit(X,y)
In[24]:
print(rfc.score(X,y))
In[25]:
import joblib
joblib.dump(rfc,'RandomForest_model.joblib')
In[35]:
array =['dbswjwiqmd']
ar2 = np.array([array])
ar2.reshape(1,127)
logReg.predict(ar2)
`
I assume that your training dataset has 127 columns which are extracted as features. To test the password, you need the apply feature extraction methods too. From 1 string/password you need to generate a vector with 127 columns.