Ignore trailing delimiters in readr::read_csv()

1.4k views Asked by At

When I read a CSV file containing a trailing delimiter using readr::read_csv(), I get a warning that a new name for the last column was created. Here is the contents of a short example file to show what a mean:

A,B,C,
2,1,1,
14,22,5,
9,-4,8,
17,9,-3,

Note the trailing comma at the end of each row. Now if I load this data with

readr::read_csv("A,B,C,\n2,1,1,\n14,22,5,\n9,-4,8,\n17,9,-3,")

I get the following message:

New names:
• `` -> `...4`

The resulting tibble has an extra fourth column names ...4 consisting of NA values in each row:

# A tibble: 4 × 4
      A     B     C ...4 
  <dbl> <dbl> <dbl> <lgl>
1     2     1     1 NA   
2    14    22     5 NA   
3     9    -4     8 NA   
4    17     9    -3 NA   

Even if I explicitly load only the first three columns with

read_csv(
    "A,B,C,\n2,1,1,\n14,22,5,\n9,-4,8,\n17,9,-3,",
    col_types=cols_only(
        A=col_integer(),
        B=col_integer(),
        C=col_integer()
    )
)

I still get this message.

Is this the expected behavior or is there some way to tell readr::read_csv() that it is supposed to ignore all columns except the ones I specify? Or is there another way to tidy up this (apparently malformed) CSV so that trailing delimiters are deleted/ignored?

2

There are 2 answers

0
Oliver Frost On BEST ANSWER

I don't think you can. From what I can see in the documentation, cols_only() is for R objects that you have already loaded in.

However, the fread() function from the data.table library allows you to select specific column names as a file is read in:

DT <- fread("filename.csv", select = c("colA","colB"))

1
AG1 On

Here's another example with error message.

> read_csv("1,2,3\n4,5,6", col_names = c("x", "y"))
Warning: 2 parsing failures.
row # A tibble: 2 x 5 col     row   col  expected    actual         file expected   <int> <chr>     <chr>     <chr>        <chr> actual 1     1  <NA> 2 columns 3 columns literal data file 2     2  <NA> 2 columns 3 columns literal data

# A tibble: 2 x 2
      x     y
  <int> <int>
1     1     2
2     4     5

Here is the fix/hack. Also see this SOF link. Suppress reader parse problems in r

> suppressWarnings(read_csv("1,2,3\n4,5,6", col_names = c("x", "y")))
# A tibble: 2 x 2
      x     y
  <int> <int>
1     1     2
2     4     5