I have a classification problem with two data-set with 200 and 50 points respectively. Out of these 40 data points are taken as test set. I have chosen kNN as the classifier considering five nearest neighbors.
n_neighbors = 5
std = 5
# generate data
X0, y0 = make_blobs(n_samples=200, centers=2, n_features=2, cluster_std = std, random_state=42)
h = .1 # step size in the mesh
X1, y1 = make_blobs(n_samples=50, centers=2, n_features=2, cluster_std = std, random_state=42)
# split into training and test set
X0_train, X0_test, y0_train, y0_test = train_test_split(X0, y0, test_size=0.2, random_state=42)
X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size=0.2, random_state=42)
I have to enrich the data in such a way that training data for class 1 is copied 16 times, such that class 1 has the same training size as class 0.
How can I copy the training data sixteen times? I do not have a clue, exactly what copying means here.
Can anyone throw some lines of code to explain the same?
I guess that you are talking about class imbalance problem, inorder to overcome from this problem, you need to do sampling( either up or down sampling ), See if the following technique could help:(https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html)