I have imported multiple Excel files with multiple spreadsheets into R, using the list.files and map_df functions and including the names of the files and sheets. It has worked well except for a problematic character variable, which has a mixture of numeric and alphanumeric values, as shown below. I am baffled why some of the alphanumeric values do not appear in the R dataframe. The structures of the files are the same.
I did the importation with 2 single-sheet files and the result is the same, so having multiple spreadsheets is not the issue. The variable in question is formatted the same way in all files. I am totally at a lost and don't have an idea what the issue might be. Please help!
Strangely, there's no problem using 100 records from the problematic file.
tibble [5,227 × 13] (S3: tbl_df/tbl/data.frame)
Except for one, which is a list (filenames), all 13 variables are character variables
Simple example.
Input
Excel: Variable A <- c("76", "76A", "187B")
Output
R: Variable A <- c("76", NA, NA)
my codes
library(tidyverse)
library(readxl)
library(purrr)
# identify the source and types of files to be merged
files <- list.files(
path = "directory",
pattern = ".xlsx",
full.names = TRUE)
# merging of files
allfiles <- map_df(.x = files,
.f = ~ {
temp_file <- .x
map_df(.x = excel_sheets(.x),
.f = ~ read_excel(path = temp_file,
sheet = .x) %>%
mutate(sheet_name = .x, section=temp_file, across(everything(),as.character)))
})