I have a data text file(17 columns) that i want to read in R. I'm using the read.table() function.
read.table(file="data.txt", header = TRUE, sep = "\t", quote = "",comment.char="")
The problem is that some of the rows take multiple lines(example below)
10 Macron serait-il plus pro-salafiste que Hamon?!
t.co/g29oOgqih1
#Presidentielle2017 FALSE 0 NA 2017-03-02 13:45:08 FALSE NA 837297724378726400 NA <a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a> Trader496 0 FALSE FALSE NA NA
Is there any way to read this type of data in a single row or do i have to use fill=TRUE
Data File: https://pastebin.com/b90VHvSt
The
readr::melt_*()ormeltr::melt_*()functions are useful for misformatted data. This can be a very tedious task, so I'll demonstrate some of the functionality and workflow without completely cleaning this data.This looks like it's tab-separated, so we'll start with
melt_tsv():This reads in the data one token at a time, with information on location and data type. For starters, it looks like the first two column names are separated by spaces instead of a tab, so were read in as one token. We can fix this, then merge in the corrected headers to the rest of the data.
I also added a count variable showing the number of columns per row. Each row should have 17 columns, so we can use this to filter and pivot the "good" rows.
This leaves us with 61 values in 13 "bad" rows. Diagnosing and fixing these will take more work, which is left as an exercise for the reader.
Created on 2022-11-09 with reprex v2.0.2