I'm sure I'm doing something silly, but I can't quite figure it out. Both read_fwf
and vroom_fwf
are producing files that lack one line (the first line, to be precise) when importing fixed-width files.
There are two files:
Suppose that both the fixed-width file and the CSV file are stored at the root directory. The code I used is
library(dplyr)
library(vroom)
library(data.table)
test <- fread(
"test.csv",
strip.white = TRUE, header = FALSE, blank.lines.skip = TRUE
) %>%
filter(!is.na(V2)) %>%
mutate(V1 = gsub(" |\\(", ".", gsub("\\)", "", V1)))
## gives one line
vroom::vroom_fwf(
"vroom_fwf_test.txt", fwf_widths(test$V3, test$V1),
n_max = 1000, col_types = cols(.default = "c"), id = "file_name"
)
This will only produce one row of data. But there are two lines in this raw file, as evidenced by
writeLines(read_lines(path)) ## two lines
which produces two lines as expected. If I leave only one line in the raw data, it'll produce zero imported rows.
The existing example in the manual, on the other hand, produces three lines as it should!
fwf_sample <- readr_example("fwf-sample.txt")
writeLines(read_lines(fwf_sample)) ## three lines
read_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn"))) ## three lines as it should
I am not sure where I've gone wrong. My session info is as follows:
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x6
[test.csv](https
[vroom_fwf_test.txt](https://github.com/tidyverse/vroom/files/12156789/vroom_fwf_test.txt)
://github.com/tidyverse/vroom/files/12156786/test.csv)
4 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] sf_1.0-9 censusxy_1.1.1 tidygeocoder_1.0.5
[4] foreign_0.8-83 lubridate_1.9.0 timechange_0.1.1
[7] data.table_1.14.8 vroom_1.6.0 janitor_2.1.0
[10] readxl_1.4.1 assertthat_0.2.1 here_1.0.1
[13] stringi_1.7.8 forcats_0.5.2 stringr_1.5.0
[16] dplyr_1.1.0 purrr_1.0.0 readr_2.1.3
[19] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0
[22] tidyverse_1.3.2 plyr_1.8.8 MASS_7.3-58.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 class_7.3-20 rprojroot_2.0.3
[4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0
[7] backports_1.4.1 reprex_2.0.2 e1071_1.7-12
[10] httr_1.4.4 pillar_1.8.1 rlang_1.0.6
[13] googlesheets4_1.0.1 rstudioapi_0.14 googledrive_2.0.0
[16] bit_4.0.5 munsell_0.5.0 proxy_0.4-27
[19] broom_1.0.2 compiler_4.2.2 modelr_0.1.10
[22] pkgconfig_2.0.3 tidyselect_1.2.0 fansi_1.0.3
[25] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1
[28] withr_2.5.0 grid_4.2.2 jsonlite_1.8.4
[31] gtable_0.3.1 lifecycle_1.0.3 DBI_1.1.3
[34] magrittr_2.0.3 units_0.8-1 scales_1.2.1
[37] KernSmooth_2.23-20 cli_3.6.0 renv_0.16.0
[40] fs_1.5.2 snakecase_0.11.0 xml2_1.3.3
[43] ellipsis_0.3.2 generics_0.1.3 vctrs_0.5.2
[46] tools_4.2.2 bit64_4.0.5 glue_1.6.2
[49] hms_1.1.2 parallel_4.2.2 colorspace_2.0-3
[52] gargle_1.2.1 classInt_0.4-8 rvest_1.0.3
[55] haven_2.5.1
Has anybody encountered a similar problem? Thank you very much.
(Opend as an issue on vroom
's GitHub repository: https://github.com/tidyverse/vroom/issues/503)
Edit for @jay.sf: yes, it works! But when the .txt file is on my local machine, it behaves differently (I've attached the screenshot and the code used there). Perhaps it's a line-ending problem of some sort?
## Both test.csv and vroom_fwf_test.txt are in the local root directory
library(dplyr)
library(vroom)
library(data.table)
test <- fread(
"test.csv",
strip.white = TRUE, header = FALSE, blank.lines.skip = TRUE
) %>%
filter(!is.na(V2)) %>%
mutate(V1 = gsub(" |\\(", ".", gsub("\\)", "", V1)))
## Originall submitted code: this gives only one line
vroom::vroom_fwf(
file = "vroom_fwf_test.txt",
col_positions = fwf_widths(test$V3, test$V1),
n_max = 1000, col_types = cols(.default = "c"), id = "file_name"
)
## @jay.sf's code: this gives two lines, yes
vroom::vroom_fwf(
file = "https://github.com/tidyverse/vroom/files/12156789/vroom_fwf_test.txt",
col_positions = with(
read.csv(
"https://github.com/tidyverse/vroom/files/12156786/test.csv",
header = F
), vroom::fwf_widths(V3, V1)
),
n_max = 1000,
col_types = vroom::cols(.default = "c"),
id = "file_name"
)
## This gives only one line again (it's the same file)
## The only difference from @jay.sf's code is that the file is local
vroom::vroom_fwf(
file = "vroom_fwf_test.txt",
col_positions = with(
read.csv(
"https://github.com/tidyverse/vroom/files/12156786/test.csv",
header = F
), vroom::fwf_widths(V3, V1)
),
n_max = 1000,
col_types = vroom::cols(.default = "c"),
id = "file_name"
)
## This gives two lines, so the problem is the .txt file?
vroom::vroom_fwf(
file = "https://github.com/tidyverse/vroom/files/12156789/vroom_fwf_test.txt",
col_positions = fwf_widths(test$V3, test$V1),
n_max = 1000,
col_types = vroom::cols(.default = "c"),
id = "file_name"
)
Your code is not exactly reproducible. I had to load
dplyr
and definepath
andn_max
to get the code running.When I import using
vroom_fwf
, I get two lines.