Splitting Data into a test set and training set

4.3k views Asked by At

Which operator in Rapidminer can I use to make an out of bag sample as my training set, and use the remaining data as my test set?

2

There are 2 answers

0
Andrew Chisholm On

The Split Data operator is one option. This makes 2 or more example sets split up the way you want and you can do what you want with these. An alternative that incorporates the training and test aspects is Split-Validation.

0
Joseph Magara On

Use the X-validation operator.

Attach your data set to the X-validation operator, then attach the output of the operator to the output node.

After this, go into the x-validation operator by double clicking it or clicking the small double blue window at its bottom right corner.

Once inside the operator, attach whatever model you wish to create (for this instance I used a decision tree model) on the training side of the data then on the testing side, attach the apply model operator to the performance operator. Finally attach the performance operator to the output.

Then press play. It should work.