Julia DataFrames convert all columns from Int to String

37 views Asked by At

Any idea why this is not working?

transform(df, All() .=> string; renamecols=false)

Isn't it supposed to apply the string function to all columns and as a result convert them? When adding ByRow it works, but an operation like this should be on entire columns not on each row.

1

There are 1 answers

3
Bogumił Kamiński On BEST ANSWER

What you describe works as expected. string takes a whole vector and converts it to string (the vector, not its contents). To work on elements of the vector use ByRow, as you have commented, or use broadcasting:

julia> df = DataFrame(x=1:2, y=3:4, z=5:6)
2×3 DataFrame
 Row │ x      y      z
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

julia> transform(df, All() .=> string; renamecols=false)
2×3 DataFrame
 Row │ x       y       z
     │ String  String  String
─────┼────────────────────────
   1 │ [1, 2]  [3, 4]  [5, 6]
   2 │ [1, 2]  [3, 4]  [5, 6]

julia> transform(df, All() .=> ByRow(string); renamecols=false)
2×3 DataFrame
 Row │ x       y       z
     │ String  String  String
─────┼────────────────────────
   1 │ 1       3       5
   2 │ 2       4       6

julia> string.(df) # broadcasting version
2×3 DataFrame
 Row │ x       y       z
     │ String  String  String
─────┼────────────────────────
   1 │ 1       3       5
   2 │ 2       4       6

The reason why in All() .=> string you still get a vector is that transform enforces that the number of rows is not changed in the result. Therefore the resulting string is reused. Note that with combine you would get a single row:

julia> combine(df, All() .=> string; renamecols=false)
1×3 DataFrame
 Row │ x       y       z
     │ String  String  String
─────┼────────────────────────
   1 │ [1, 2]  [3, 4]  [5, 6]

To highlight the issue see how string operates on a vector without DataFrames.jl:

julia> string([1, 2])
"[1, 2]"