How to export skimr::skim() results to a file with variable type reset on many data frames?

576 views Asked by At

I have 2 data frames(more in real life). My goal is to generate summary reports with skimr package then export them as a file to a folder. They would also have different file names. The problem that makes it not so straight forward is the I need the variable type of any variable with a "DATE" in the name to be converted to Date variable as the type(so I can generate range and median etc. with date as the data type). Also another variable named "USER_ID" to be converted to Character data type instead of the default numeric.

 df1 <- data.frame(x = rep(3, 3), USER_ID = C(292932, 293923, 392343), CONTACT_DATE = C(4/3/2022, 3/3/2012, 4/3/2011))

 df2 <- data.frame(x = rep(5, 3), USER_ID = C(292932, 293423, 392343), ORDER_DATE = C(3/4/2012, 4/5/2019, 4/3/2012))

Just finding a faster way to accomplish the below:

df1

df1$CONTACT_DATE<- as.Date(df1$CONTACT_DATE, "%m/%d/%Y")

df1$USER_ID<-as.character(df1$USER_ID)

df1_summary<-skim(df1)

Followed by a function that can output the df1_summary into a file.

df2 

df2$ORDER_DATE<- as.Date(df2$ORDER_DATE, "%m/%d/%Y")

df2$USER_ID<-as.character(df2$USER_ID)

df2_summary<-skim(df2)

Followed by a function that can output the df2_summary into a file.

The summary output would ideally contain the entire skim output, it can be in any editable file format.

Thank you in advance!

2

There are 2 answers

0
Elin On

The best work flow for using skimr is iterative. What I would suggest is that you write a function for converting any column in the data frame with the string DATE in the name. Then run the skim() on the converted data. You can do this using dplyr mutate() which has tidyselect functions for handling this. Once you have a function you can use purrr or lapply to use the function on all of your data frames.

Then do the step of running skimr similarly using purr or some apply function.

What you do in terms of saving depends on how you want to save it. Do you want the skim object (one giant data frame) or something like what is printed.

0
qwr On

You can save to a file the exact R output to the console with sink.

sink("skim.txt")
print(my_skim_df)
sink()  # turn off diversion