Using a character vector component as an argument to an R function

112 views Asked by At

In order to find the best distribution fit to a dataset, I need to pass a component from a character vector of possible distributions (intentionally short-sized in this example) to the ks.test() R function as an argument. So, my problem relates to Statistics and, much more generally, to R programming.

install.packages("ISLR")
library(ISLR)
attach(Credit)
distr.list <- c("pbeta","pbinom","pcauchy","pchisq")
p.val <- double(length(distr.list))
for (i in 1:length(distr.list))
   {
    p.val[i] <- ks.test(Income,distr.list[i])$p.value 
   }

I get:

Error in y(sort(x), ...) : argument "shape1" is missing, with no default

What does it mean? Where is my mistake? Many thanks in advance.

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
1

There are 1 answers

1
whuber On

Use explicit distribution functions, as in

distr.list <- list(
  pbeta = function(x) pbeta(x, 1, 2),
  pbinom = function(x) pbinom(x, 8, 3),
  pcauchy = pcauchy,
  pchisq = function(x) pchisq(x, 4)
)
p.val <- double(length(distr.list))
for (i in 1:length(distr.list))
{
  p.val[i] <- ks.test(Income,distr.list[[i]])$p.value 
}
# Optional:
names(p.val) <- names(distr.list)

Notice the [[ indexing for the list in the loop. The actual parameter values in this example are pure fabrications -- you will need to supply values suitable for your purposes and assumptions.


Although this will make your code run, it won't be statistically correct, but that's a different set of issues. In particular, what sense would it make to use discrete distributions like the Binomial, continuous bounded distributions like the Beta, and continuous unbounded distributions like the Cauchy all as reference distributions for evaluating a given set of data?

Extensive comparison (via some distribution test like the KS) of a set of distributions to data, as exemplified by this code, is not usually a good approach to fitting a distribution. Distribution fitting is usually a matter of estimating a set of parameters to pin down a reasonable range of distributions within a family of assumed distributional models. How that's done is a big part of what statistics is about.