Bulk import a list of tables from SQL server to RStudio

50 views Asked by At

I am trying to import a long list of tables into RStudio. I want to be able to paste a list of table names into RStudio and then easily import all of the tables (doing so manually would take a very long time).

I am able to paste the list of table names into R and create a data frame and a vector that is a list of table names in the structure: 'databasename.schemaname.tablename1' etc

When I use dbReadTable to import any single table using the below, it works fine.

table1 <- dbReadTable(conn, SQL('databasename.schemaname.tablename1'))

I have tried writing a for loop and a function using dbReadTable() to do this for all tables in the list (input_list, a character vector of length n), which I would like to import into objects with the same name. However, I run into errors. This is what I have tried:

Version 1

# v1
# create a function to bulk import from SQL
bulk_sql_import <- function(t_list) {
  for(i in seq_along(t_list)) {
    dbReadTable(conn, 
                SQL(get(t_list[i])))
  }
}
# use the function
bulk_sql_import(input_list)

#Error in h(simpleError(msg, call)) : 
#  error in evaluating the argument 'name' in selecting a method for function 'dbReadTable': invalid first argument

Version 2

v2
# create a function to bulk import from SQL
bulk_sql_import2 <- function(t_list) {
  for(i in seq_along(t_list)) {
    get(t_list[i]) <- dbReadTable(conn, 
                                  SQL(get(t_list[i])))
  }
}
# use the function
bulk_sql_import2(input_list)

#Error in h(simpleError(msg, call)) : 
#  error in evaluating the argument 'name' in selecting a method for function 'dbReadTable': object 'databasename.schemaname.tablename1' not found

This last error is confusing as the table can be found by dbReadTable() normally.

Any help would be greatly appreciated!

Thanks :)

1

There are 1 answers

0
r2evans On
  1. get(..) <- ... isn't a thing. assign is the way to do that, but ... generally using get/assign are not indicators of solid workflows. Generally it is better to work with a list of frames. (Functions that rely on this break the popular and effective "functional paradigm", where functions return things and do not invoke side effect.)

  2. Realize that even if we do assign(get(t_list[i]), dbReadTable(..)) here, it is here, as in "in this function", not in the calling environment. You need to store it in the environment of the calling environment. We'll use envir=parent.frame() in the function definition to remedy this.

  3. More-so in functional programming, your function should never break "scope" and expect a variable to be available that is not explicitly passed to it. Here, conn is mysteriously used without definition. It's bad practice to assume it in the calling environment, much safer to have it passed explicitly (perhaps with a default value) by the caller.

  4. Here's the final thing: you are trying to get(t_list[i]), which means to "find the local variable that is named this" ... but there is no local variable yet, so it will fail with object '**' not found.

As a first cut, this should work for you:

dbfun <- function(conn, vec, envir = parent.frame()) {
  for (i in seq_along(vec)) {
    ret <- dbReadTable(conn, SQL(vec[i]))
    assign(vec[i], ret, envir = envir)
  }
}

I'll spend a moment to evolve this a little, in a way that makes it a bit more programmatic, flexible, and robust. For instance, we don't need to count with i, we just need names; lapply is a little more canonical here, though admittedly it doesn't add a lot but is often a preferred way of doing things in R (and is in alignment with the "list of frames" concept I mentioned earlier).

dbfun2 <- function(conn, vec, envir = parent.frame()) {
  stopifnot(is.null(envir) || inherits(envir, "environment"))
  out <- lapply(setNames(nm = vec), function(tb) {
    ret <- tryCatch(
      dbReadTable(conn, SQL(tb)),
      error = function(e) conditionMessage(e))
    if (!is.null(envir)) assign(tb, ret, envir = envir)
    ret
  })
  invisible(out)
}

# your default, assigning objects to the calling environment
dbfun2(conn, vec_of_table_names)

# similar, but now also accessible as `ret[["db.schm.tablename"]]`, etc
ret <- dbfun2(conn, vec_of_table_names)

# list of frames, no side-effect
ret <- dbfun2(conn, vec_of_table_names, envir = NULL)

Improvements:

  • use envir=NULL if/when you want it to just return, not assign
  • always return a list of frames, using invisible(.) so that if you just call it by itself, it will not trash your console
  • tryCatch so that if one table fails, it will continue; the error text is included in place of the table (as a character string); there are many other ways to approach this, some subjective, feel free to explore this space

Personally, I would set the default to envir=NULL and require that the user explicitly elect side-effect by calling it as

dbfun2(conn, vec_of_table_names, envir = environment())

to indicate the intent to store it in the current (calling) environment.