Daru Ruby Gem - How do I transform a categorical variable into a binary one

249 views Asked by At

I have the following Daru Data Frame with a categorical variable called search_term:

home,search_term,bought
0,php,1
0,java,1
1,php,1
...

I want to convert it to a Daru Data Frame with binary columns, something like:

home,php,java,bought
0,1,0,1
0,0,1,1
1,1,0,1
...

I can't find a way to achieve it. I know it's possible in Python's Panda but I want to use Ruby with the Darus gem.

Thanks.

1

There are 1 answers

0
kojix2 On

According to a blog post written by Yoshoku, the author of Rumale machine learning library, you can do it like:

train_df['IsFemale'] = train_df['Sex'].map { |v| v == 'female' ? 1 : 0 }

Rumale's label encoder is also useful for the categorical variable.

require 'rumale'
encoder = Rumale::Preprocessing::LabelEncoder.new
labels = Numo::Int32[1, 8, 8, 15, 0]
encoded_labels = encoder.fit_transform(labels)
# Numo::Int32#shape=[5]
# [1, 2, 2, 3, 0]

Rumale::Preprocessing::OneHotEncoder

encoder = Rumale::Preprocessing::OneHotEncoder.new
labels = Numo::Int32[0, 0, 2, 3, 2, 1]
one_hot_vectors = encoder.fit_transform(labels)
# > pp one_hot_vectors
# Numo::DFloat#shape[6, 4]
# [[1, 0, 0, 0],
#  [1, 0, 0, 0],
#  [0, 0, 1, 0],
#  [0, 0, 0, 1],
#  [0, 0, 1, 0],
#  [0, 1, 0, 0]]

But, conversion of Daru::Vector and Numo::NArray needs to use to_a.

encoder = Rumale::Preprocessing::LabelEncoder.new
train_df['Embarked'] = encoder.fit_transform(train_df['Embarked'].to_a).to_a