Split character field on first space without dropping fields in r

2.8k views Asked by At

I want to split up the field "Fare_class" on the first space with dropping any fields.I know a similar question exists, but when i tried that approach, it dropped all fields except for "Fare_Class".

Travel_class    Fare_class          Avios_awarded      
First           Flexible F        300% of miles flown       
First           Lowest A          250% of miles flown              
Business     Flexible J, C, D     250% of miles flown    
Business       Lowest R, I        150% of miles flown             

Below is the table I'd like to create. Splitting "Fare_class" on the first space into two new fields "Fare" and "Booking".

Travel_class    Fare_class       Fare       Booking      Avios_awarded      
First            Flexible F      Flexible     F      300% of miles flown       
First            Lowest A        Lowest       A      250% of miles flown              
Business      Flexible J, C, D   Flexible   J,C,D   250% of miles flown    
Business        Lowest R, I      Lowest      R,I    150% of miles flown   
3

There are 3 answers

0
Otto Kässi On
library(stringr)

Fare_class <- c('Flexible F',
 'Lowest A',
 'Flexible J, C, D',
 'Lowest R, I')

fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)

str_split is used to split the string into (n=) 2 pieces. The output of str_split is a list of 2-element vectors. sapply(..., '[[', ) is used to return the first / second subelement of each list element.

0
Santosh M. On

Alternative 1:

library(stringr)
str_split_fixed(Fare_class, " ", 2)

#     [,1]        [,2]     
#[1,] "Flexible"  "F"      
#[2,] "Lowest"    "A"      
#[3,] "Flexible"  "J, C, D"
#[4,] "Lowest"    "R, I" 

Alternative 2:

library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))

#      Fare  Booking
#1 Flexible        F
#2   Lowest        A
#3 Flexible  J, C, D
#4   Lowest     R, I
0
acylam On

Here's a solution with separate from tidyr to split the column by regex:

library(tidyr)

separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)

or use extract for more complex patterns to split by capture groups:

extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)

Result:

  Travel_class       Fare_class     Fare Booking        Avios_awarded
1        First       Flexible F Flexible       F  300% of miles flown
2        First         Lowest A   Lowest       A  250% of miles flown
3     Business Flexible J, C, D Flexible J, C, D  250% of miles flown
4     Business      Lowest R, I   Lowest    R, I  150% of miles flown

Note:

If you don't want to keep the original column Fare_class, just remove remove = FALSE from separate or extract.

Data:

df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business", 
"First"), class = "factor"), Fare_class = structure(c(1L, 3L, 
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A", 
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L, 
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown", 
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class", 
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA, 
-4L))