I am trying to use DataExplorer to help with quick EDA. I like how it shows univariate distributions. Here is a reproducible example.
A <- c(rep(c(1,2,3,4,5), 200))
A<- factor(A)
B <- c(x=rnorm(1000))
C <- c(x= rnorm(1000, mean = 100, sd=2))
D <- c(x= rnorm(1000, 2, 2))
df<- data.frame(A, B, C, D)
df %>%
create_report(
output_file = "trial",
y= "A", #to get barplots, QQ plots and scatterplots by grouping variable "A"
report_title = "trial_EDA",
config = configure_report(
add_plot_density = TRUE #To add density plots to report
)
)
I want to visualize density by grouping variable, "A", as shown in the picture attached.
But I don't know how to use plot density args properly to do this. Also, please suggest other packages to easily navigate through large datasets as a preliminary analysis. Thanks!
You have not specified which variable the
B,CorDdensity graph should apply to. If there is only one, e.g.Bthen do it like this:You can also do it separately for each of the variables on one plot.
And if you don't like the fillings, you can do it like this
Update 1
It is possible to create charts for any number of columns. I will show it to you in the example below. First, we'll do it in a very simple, even trivial way.
As you can see, we created three plots for variables
B,CandD.The second way is a bit more difficult to understand. But it will give you some extra bonuses.
Note that your
tibbleafterdf %>% pivot_longer(B:D, names_to = "var", values_to = "val")looks like this.After doing
df %>% pivot_longer(B:D, names_to = "var", values_to = "val") %>% group_by(var) %>% nest()looks like this:As you can see the data has been collapsed into three internal
tibblein the variabledata. This approach will allow you to easily calculate all statistics for each column separately. Look at this.output
Czy to nie fajne?