Reading tables with strings containing the separator in R

1k views Asked by At

I got a text file with data I want to read, but one of the columns is a messy "code" which contains the same character used as the separator. Take the following set as an example:

number:string
1:abc?][
2:def:{+

There will be a line with 3 columns and only 2 column names. Is there any strategy to read this dataset?

2

There are 2 answers

0
dimitris_ps On

Good old regular expressions should help you with this

Read txt file

df <- read.table("pathToFile/fileName.txt", header = TRUE)

The data.frame will be one column, so we will need to split it based on some pattern

Create the columns

df$number <- sub("([0-9]+):.*", "\\1", df[, 1])
df$string <- sub("[0-9]+:(.*)", "\\1", df[, 1])

df <- df[, c("number", "string")]
View(df)
0
Spacedman On

Read the file a line at a time, split into two parts on the ":", bind into a data frame. The column names get lost but you can put them back on again easy enough. You need the stringr and readr packages:

> do.call(rbind.data.frame,stringr::str_split(readr::read_lines("seps.csv",skip=1),":",2))
  c..1....2.. c..abc.......def.....
1           1                abc?][
2           2                def:{+

Here with stringr and readr attached for readability, with the names fixed:

> library(stringr)
> library(readr)
> d = do.call(rbind.data.frame,str_split(read_lines("seps.csv",skip=1),":",2))
> names(d) = str_split(read_lines("seps.csv",n_max=1),":",2)[[1]]
> d
  number string
1      1 abc?][
2      2 def:{+