Use dplyr with database without creating an explicit DBI object

180 views Asked by At

Most code examples showing how to use dplyr with a database involve creating a database connection object:

connStr <- "driver=driver;server=hostname;database=mydatabase;..."
db <- DBI::dbConnect(odbc::odbc(), .connection_string=connStr)

tbl <- tbl(db, "mytable")
tbl %>% verb1 %>% verb2 %>% ...

However, suppose I omit creating a db object:

tbl <- tbl(DBI::dbConnect(odbc::odbc(), .connection_string=connStr), "mytable")
tbl %>% verb1 %>% verb2 %>% ...

Are there any consequences to this? Will I use up database resources/leak memory/etc?

The DBMS I have in mind is SQL Server, and the driver package is odbc, in case it matters.

1

There are 1 answers

0
krlmlr On BEST ANSWER

The new DBI specs assume that the caller frees all connection they allocate with dbConnect() with a corresponding call to dbDisconnect(). Failure to do so will close the connection only during garbage collection (or the end of the R session), thus delaying liberation of resources, or even leak the connection.

The exact behavior depends on the DBI backend involved (in this case the odbc package). According to Jim Hester, the maintainer of odbc,

[it] automatically calls dbDisconnect() when the connection object is garbage collected, so this won't leak connections. If you are opening a large number of connections it is always best to be explicit, if you are just doing this interactively it is probably ok to rely on the garbage collector in this case.