How to use broom::tidy for Tobit models?

128 views Asked by At

Dear Stackoverflow community,

I am struggling with using the tidy function from the broom package. I need this function in the context of a multiple imputation.

You can see here a reprex example using a ggplot2 dataset.

library(ggplot2, quietly = T)
library(AER, quietly = T)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(broom, quietly = T)
library(tidyverse, quietly = T)

data <- ggplot2::diamonds

data
#> # A tibble: 53,940 × 10
#>    carat cut       color clarity depth table price     x     y     z
#>    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#>  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#>  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#>  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
#>  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
#>  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#>  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
#>  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
#>  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#>  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
#> 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
#> # ℹ 53,930 more rows

fit <- AER::tobit(
  formula = price ~ z,
  left = 500,
  right = 1500,
  data = data
)

fit %>%  summary(tidy)
#> Error in if (correlation) cov2cor(vcov.) else NULL: argument is not interpretable as logical
Created on 2023-07-26 with reprex v2.0.2

I checked my R version and the package version:

R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

other attached packages: [1] reprex_2.0.2 broom_1.0.5 AER_1.2-10

I identified the following issue and resolution on GitHub but I was not able to transfer it into my in-working project: https://github.com/tidymodels/broom/issues/749 https://github.com/tidymodels/broom/commit/56437bce30841211bfa64074677fe2d9124d99cc

Any help would be very much appreciated!

Thank you in advance!

2

There are 2 answers

4
Ben Bolker On BEST ANSWER

I came up with a workaround, which you should use with caution: class(fit) is c("tobit", "survreg"), and there is a tidy method for survreg objects (see methods("tidy")), but tidy(fit) returns

Error: No tidy method for objects of class tobit

So:

tidy.tobit <- function(x, ...) {
   class(x) <- "survreg"
   tidy(x, ...)
}
 
tidy(fit)
# A tibble: 3 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept) -3109.    19.7         -158.       0
2 z            1420.     6.75         210.       0
3 Log(scale)      5.63   0.00551     1021.       0

This appears to match the results of summary():

summary(fit)

Call:
AER::tobit(formula = price ~ z, left = 500, right = 1500, data = data)

Observations:
         Total  Left-censored     Uncensored Right-censored 
         53940           1749          18261          33930 

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.109e+03  1.970e+01  -157.8   <2e-16 ***
z            1.420e+03  6.751e+00   210.3   <2e-16 ***
Log(scale)   5.626e+00  5.511e-03  1020.9   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Scale: 277.6 

Gaussian distribution
Number of Newton-Raphson Iterations: 8 
Log-likelihood: -1.35e+05 on 3 Df
Wald-statistic: 4.423e+04 on 1 Df, p-value: < 2.22e-16 

Note: to make this work in mice you have to explicitly override the version in broom via

assignInNamespace("tidy.tobit",tidy.tobit,ns="broom")

before you run pool().

2
I_O On

fit %>% summary(tidy) won't work as the correct grammar is fit %>% summary %>% tidy. However, there's no applicable tidy-method for a tobit summary, unfortunately.

Anyhow, {broom} offers you a glance at your model:

> glance(fit) ## or: fit |> glance()
# A tibble: 1 x 9
   iter    df statistic   logLik     AIC     BIC df.residual  nobs p.value
  <int> <int>     <dbl>    <dbl>   <dbl>   <dbl>       <int> <int>   <dbl>
1     8     3    72669. -135047. 270099. 270126.       53937 53940       0

... or an augmented (with model predictions) dataframe:

fit |> augment(data)
# A tibble: 53,940 x 12
   carat cut    color clarity depth table price     x     y     z .fitted .resid
   <dbl> <ord>  <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>   <dbl>  <dbl>
 1  0.23 Ideal  E     SI2      61.5    55   326  3.95  3.98  2.43    341.  159. 
 2  0.21 Premi~ E     SI1      59.8    61   326  3.89  3.84  2.31    170.  330.