I'm writing a function where I want the main output to be a data frame (that can be piped to other functions), but I also want to allow users access to an informative list or vector of samples that were omitted from the final result. Are there best practices for how to go about this, or examples of functions/packages that do this well?
Currently I'm exploring returning the information as an attribute and throwing a warning informing users they can access the list with attr(resulting-df, "omitted")
Any advice would be greatly appreciated, thank you!
library(dplyr)
iris <- iris %>%
mutate(index = 1:nrow(.))
return_filtered <- function(df) {
res <- filter(df, Sepal.Length > 6)
omitted <- setdiff(iris$index, res$index)
attr(res, "omitted") <- omitted
return(res)
}
iris2 <- return_filtered(iris)
attributes(iris2)
#> $names
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#> [6] "index"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61
#>
#> $omitted
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#> [20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
#> [39] 39 40 41 42 43 44 45 46 47 48 49 50 54 56 58 60 61 62 63
#> [58] 65 67 68 70 71 79 80 81 82 83 84 85 86 89 90 91 93 94 95
#> [77] 96 97 99 100 102 107 114 115 120 122 139 143 150
Created on 2022-04-02 by the reprex package (v2.0.1)
The question is probably a little opinion-based, but I don't think it's off-topic, since there are certainly neater and more formal ways to achieve what you want than your current method.
It's reasonable to hold the extra information as an attribute, but if you are going to do this then it is more idiomatic and extensible to create an S3 class, so that you can hide default printing of attributes, ensure your attributes are protected, and define a getter function for the attributes so that users don't have to sift through multiple attributes to get the correct one.
First, we will tweak your function to work with any data frame, and allow it to take any predicate so that it works as expected with
dplyr::filter
. We also get the function to add to the returned object's class attribute, so that it returns a new S3 object which inherits fromdata.frame
We will define a print method so that the attributes don't show when we print our object:
To get the filtered-out data from the attributes, we can create a new generic function that will only work on our new class:
So now, when we call
return_filtered
, it seems to work the same asdplyr::filter
, returning what appears to be a normal data frame:But we can get the filtered-out data from it with our
get_filtered
function.And calling
get_filtered
on a non-filtered data frame returns an informative error:Created on 2022-04-02 by the reprex package (v2.0.1)