Divide dataframe into two sets according to a column

218 views Asked by At

I have Dataframe df i choosed some coulmns of it and i want to divide them into xtrain and xtest accoring to a coulmn called Sevrice. So that raws with 1 and o into the xtrain and nan into xtest.

Service
1
0
0
1
Nan
Nan

xtarin = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]

EDITED

    ytrain = df['Service'].dropna()
    Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    logistic = LogisticRegression()
    logistic.fit(xtrain, ytrain)
    logistic.predict(xtest)

I get this error for logistic.predict(xtest)

X has 220 features per sample; expecting 307
1

There are 1 answers

6
jezrael On BEST ANSWER

I think you need isnull:

Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]

Another solution is invert boolean mask by ~:

mask = df['Service'].notnull()
xtarin = df.loc[mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
Xtest = df.loc[~mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]

EDIT:

df = pd.DataFrame({'Service':[1,0,np.nan,np.nan],
                   'Age':[4,5,6,5],
                   'Fare':[7,8,9,5],
                   'GSize':[1,3,5,7],
                   'Deck':[5,3,6,2],
                   'Class':[7,4,3,0],
                    'Profession_title':[6,7,4,6]})

print (df)
   Age  Class  Deck  Fare  GSize  Profession_title  Service
0    4      7     5     7      1                 6      1.0
1    5      4     3     8      3                 7      0.0
2    6      3     6     9      5                 4      NaN
3    5      0     2     5      7                 6      NaN

ytrain = df['Service'].dropna()
xtrain = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
import pandas as pd
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(xtrain, ytrain)
print (logistic.predict(xtest))
[ 0.  0.]