Why is vroom so slow?

Question

Why is vroom so slow?

593 views Asked by Kyouma At 01 July 2021 at 14:15

I have a simple operation where I read several csvs, bind them, and then export, but vroom is performing much slower than other methods. I must be doing something wrong, but I'm not sure what, or why.

library(readr)
library(vroom)
library(data.table)
library(microbenchmark)

write_csv(mtcars, "test.csv")

microbenchmark(
  readr={
    t <- read_csv("test.csv", col_types=cols())
    write_csv(t, "test.csv")
  },data.tabl={
    t <- fread("test.csv")
    fwrite(t, "test.csv", sep=",")
  },vroom={
    t <- vroom("test.csv", delim=",", show_col_types = F)
    vroom_write(t, "test.csv", delim=",")
  },
  times=10
)
#> Unit: milliseconds
#>       expr       min        lq      mean    median        uq        max neval
#>      readr 12.636961 12.662955 15.865400 12.928211 13.503029  41.104583    10
#>  data.tabl  2.200815  2.275252  2.633456  2.342797  2.529283   4.830134    10
#>      vroom 57.376353 57.915135 64.280365 58.496847 58.966311 117.150837    10

^{Created on 2021-07-01 by the reprex package (v2.0.0)}

Original Q&A

There are 1 answers

**jmcastagnetto** · Accepted Answer · 2021-07-01T15:42:05+00:00

To do a test with more data, I used the CSV from https://www.datosabiertos.gob.pe/dataset/vacunaci%C3%B3n-contra-covid-19-ministerio-de-salud-minsa, which contains 7.3+ million rows, and used a slight variation of your code:

library(readr)
library(vroom)
library(data.table)
library(microbenchmark)
csv_file <- "vacunas_covid.csv.gz"
microbenchmark(
   readr={
     t <- read_csv(csv_file, col_types=cols())
     write_csv(t, csv_file)
   },data.table={
     t <- fread(csv_file)
     fwrite(t, csv_file, sep=",")
   },vroom={
     t <- vroom(csv_file, delim=",", show_col_types = F)
     vroom_write(t, csv_file, delim=",")
   },
   times=5
)

The results were:

Unit: seconds
       expr       min        lq      mean    median        uq       max neval  cld
      readr 101.72094 105.75384 109.16869 106.08111 108.06967 124.21788     5    c
 data.table  28.18751  30.32570  31.06592  30.44838  33.12746  33.24055     5  a
      vroom  48.65399  51.52445  55.78264  52.89823  53.83582  72.00071     5   b

From the results, vroom is at least 2x than readr using a big dataset, and data.table is ~1.7x faster than vroom. Perhaps the issue with the original example is that the data is small, and the indexing that vroom performs is contributing to the difference.

Just in case the code and results are at: https://gist.github.com/jmcastagnetto/fef3f3a2778028e7efb6836d6d8e3f8e

TechQA.

Why is vroom so slow?

There are 1 answers

Related Questions in R

Related Questions in TIDYVERSE

Related Questions in VROOM

Popular Questions

Popular Tags

Trending Questions