Labelling quartiles in the Boxplot using R

389 views Asked by At

I was plotting the Boxplot and labelling it with quartiles and min-max values. It worked fine for a few columns; however, for some columns, the stats value was not exactly matching with the boxplot stats.

For example, the summary command was giving a median value of 2320, whereas boxplot.stats were giving the value 2319.5.

I was using Statlog (German Credit Data) Data Set for credit risk scoring.

Dataset link: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)

1

There are 1 answers

0
dcarlson On

Different functions can format values differently. The printed value is based on the value set in options("digits") which is often about 7 significant digits (not decimal places) but rarely the exact value. In addition to the system setting, the function can set a different value for displaying numbers. The only way to see the entire value as it is stored internally is to use dput():

set.seed(42)
x <- runif(25)
summary(x)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 0.08244 0.45774 0.65699 0.61295 0.91481 0.98889 
dput(summary(x))
# structure(c(Min. = 0.0824375580996275, `1st Qu.` = 0.45774177624844, 
# Median = 0.656992290401831, Mean = 0.612946688365191, `3rd Qu.` = 0.914806043496355, 
# Max. = 0.988891728920862), class = c("summaryDefault", "table"))
boxplot.stats(x)
# $stats
# [1] 0.08243756 0.45774178 0.65699229 0.91480604 0.98889173
# 
# $n
# [1] 25
# 
# $conf
# [1] 0.5125600 0.8014246
# 
# $out
# numeric(0)
# 
dput(boxplot.stats(x))
# list(stats = c(0.0824375580996275, 0.45774177624844, 0.656992290401831, 
# 0.914806043496355, 0.988891728920862), n = 25L, conf = c(0.51255998195149, 
# 0.801424598852172), out = numeric(0))

Notice that both functions compute the same value for the median, but boxplot.stats prints out more decimal places. Another factor for quantiles other than the median is that there are different ways of computing them. The quantile function offers 9 different methods (see ?quantile).