additional columns being created when using read.table

957 views Asked by At

Since I can't provide the .txt file I'm using I can only describe the situation...

The text file has no missing values and is a tab-separated text file or at least it appears to be. When I use tab separated delimiter it seems to be fine. The column headers are provided with names that contain spaces (e.g. Age of Parent).

When I load the data using the following line of code it looks like everything loads properly. However I am ending up with a bunch of duplicate columns.

For example - "Age of Parent" will be relabeled as Age.of.Parent since you can't have spaces in column names but there will be a 2nd column that is identical with the values but has the name - Age.of.Parent1

Question: What do I need to do to ensure that there are none of these 'Duplicate' columns being created? The column Age.of.Parent1 is clearly not in the dataset, but out of maybe 20 columns I am ending up with a total of 30 (10 new duplicates with this '1' at the end).

read.table('mydata.txt', header=TRUE,  stringsAsFactors= FALSE, sep='\t')
1

There are 1 answers

0
AudioBubble On

Here is an example to show how a data frame can be saved in a tab separated file and reading from it.

library(caroline)

Age <- c(20, 30, 50) 
Names <- c("Name1", "Name2", "Name3") 
df <- data.frame(Age, Names)
colnames(df) <- c("Age of Parents", "Names of Parents")

#writing the data frame to a tab delimited text file
write.delim(df, file = "foo.txt")

#reading the tab delimited text file 
#The argument fill is logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added.
read.delim(file="foo.txt", header = TRUE, sep = "\t", fill = TRUE)

Output looks like this:

enter image description here