Julia DataFrames concatenate multiple columns by a space

37 views Asked by At

In DataFrames, I have 4 columns of type String. I want to concatenate all of their values with a space.

Currently, I'm doing this:

transform(df, All() => ((a,b,c,d) -> a .* " " .* b .* " " .* c .* " " .* d) => :combined_col)

Is there a more concise way of doing this without using .* multiple times? Maybe using the join function?

p.s., I'm using this inside a @chain so I want the same style of syntax not using indexing.

UPDATE: this works but I have no idea why can someone explain?

transform(df, All() => ByRow((all...) -> join(all, " ")) => :combined)
1

There are 1 answers

0
Bogumił Kamiński On BEST ANSWER

Let me explain transform(df, All() => ByRow((all...) -> join(all, " ")) => :combined):

  1. You need ByRow to apply the function row-wise to your data frame.
  2. The join function accepts an iterator as its first argument, so all must be an iterator (in your example, it is a tuple).
  3. The All() source passes the selected columns as consecutive positional arguments to the function. Therefore you need all... to turn consecutive positional arguments into a tuple.

Instead of all... you could write:

transform(df, AsTable(All()) => ByRow(x -> join(x, " ")) => :combined)

The difference is that AsTable(All()) passes the selected columns as a single positional argument to the function (in a form of named tuple). Therefore you already have an iterable to pass to join (since named tuple is iterable).

Going back to your original question how to use .* to get the result the answer is:

transform(df, All() => ((x...) -> foldl((p, q) -> p .* " " .* q, x)) => :combined)

Note that you do not need ByRow in this case as .* already does broadcasting. You would need it if you used * instead of .*:

transform(df, All() => ByRow((x...) -> foldl((p, q) -> p * " " * q, x)) => :combined)