Read a text file with one line and split it to multiple rows based on a delimiter

754 views Asked by At

I want to import a .txt file in R dataframe using "readr" library functions such as read.delim, read.table, read.csv. The .txt file has only one single row which contains all data.

This one row should be split into different rows during import and the delimiter for this is one whitespace. The values in one row are defined with TAB delimiter. Whatever I try, I was not able to split this one row into many rows with the defined whitespace delimiter. The file is always imported as one row. Is there a way to import in this specific way in R?

My trials only resulted in dataframes with a single row displaying all data in columns.

A, B, and C are column names. 3, 2, and 1 should be the first row and 4, 5, and 6 should be the second.

Example Data: "A"   "B"   "C" "3"   "2"   "1" "4"   "5"   "6"
3

There are 3 answers

1
Gregor Thomas On
# I use `text = ...` for illustration purposes,
# you can use `read.table(file = "your/file/path.txt")
data = read.table(text = '"A"   "B"   "C" "3"   "2"   "1" "4"   "5"   "6"') 
data = 
  data[-(1:3)] |>
  unlist() |>
  as.numeric() |>
  matrix(ncol = 3, byrow = TRUE) |>
  as.data.frame() |>
  setNames(data[1:3])
data
#   A B C
# 1 3 2 1
# 2 4 5 6
1
George Savva On

It's not tidyverse, but this should work. I'm reading the whole file as a text string with readChar, changing the spaces without adjacent tabs to newlines with gsub then passing that string to read.table.

fileName <- 'testinput.txt'
readChar(fileName, file.info(fileName)$size)  |>
  gsub(pattern = "([A-Z0-9\"]) ([A-Z0-9\"])" ,replacement = "\\1\n\\2") |>
  read.table(text=_,header=TRUE)

  A B C
1 3 2 1
2 4 5 6

There is probably a tidyverse way to perform these same steps.

0
M-- On

Here's a solution in base:

fileName <- 'C:\\test.txt'

read.table(text = paste0(strsplit(
                         readChar(fileName, file.info(fileName)$size), ' ')[[1]], 
                   collapse = "\n"), 
           sep = "\t", header = T)
#   A B C
# 1 3 2 1
# 2 4 5 6