When writing to an excel sheet, how can I add a new row each time I run the script?

3.7k views Asked by At

I've written a script that pulls and summarizes data from 2 databases regularly. I'm using write.xlsx to write 3 data frames to 3 different tabs:

  • Tab1 is data from one database,
  • Tab2 is data from another, and
  • Tab3 is data summarized from both.

It is fine if Tabs 1&2 are overwritten with each run, but I'd like Tab3 to add a new row (with sys.date in col1) each run, thus giving me a history of the data summaries. Is this possible and how?

append=TRUE is not getting me what I'm looking for. My next option (less desirable) is to just write the summarized data to a separate file with sys.date in the filename.

# Write to spreadsheet w/ 3 tabs
write.xlsx(df1, file = "file_location\\filename.xlsx", sheetName = "dataset1", row.names = FALSE, append = FALSE) #write to tab1

write.xlsx(df2, file = "file_location\\filename.xlsx", sheetName = "dataset2", row.names = FALSE, append = TRUE) #write to tab2

write.xlsx(summary_df, file = "file_location\\filename.xlsx", sheetName = "summarized_data", row.names = FALSE, append = TRUE) #write to next blank row in tab3

The script is writing over the summary data with each run. I would like it to append it to the next blank row.

2

There are 2 answers

0
Marc On

I would check out the openxlsx package - it allows you to write data starting at a given row in an Excel file. The only catch is that you will need to read the current Excel file in order to determine which line is the last one containing data. You can do that with the readxl package and using nrow() of the dataframe you create from the Excel file to determine the number of rows.

0
Jason Johnson On

So the openxlsx package allows you to specify not only the row to which you wish to append the new data but also to extract the last row of the current workbook sheet. The trick is to write the date and the new summary data as DataTable objects which are unique to the openxlsx package. The function writeDataTable will allow you to do this and the only required arguments are the name of workbook object, the name of the sheet in the workbook and any R data.frame object. The workbook object can be obtained using the loadWorkbook function which simply takes the path to the Excel filename. Also you can read in an entire Excel workbook with multiple sheets and then extract and modify the individual worksheets before saving them back to the original Excel workbook.

To create the initial workbook so that the updates are compatible to the original workbook simply replace the write.xlsx statements with writeDataTable statements as follows:

    library(openxlsx)
    wb <- createWorkbook()
    writeDataTable(wb,sheet="dataset1",x=dataset1)
    writeDataTable(wb,sheet="dataset2",x=dataset2)
    writeDataTable(wb,sheet="summarized_dat",x=summary_df)
    saveWorkbook(wb,"file_location\\filename.xlsx")

So after creating the initial workbook you can now simply load that and modify the sheets individually. Since you mentioned you can overwrite dataset1 and dataset2 I will focus on summarized_dat because you can just use the code above to overwrite those previous datasets:

    library(openxlsx)
    wb <- loadWorkbook("file_location\\filename.xlsx")

Now you can use the function below to append the new summarized data with the date. For this I converted the date to a data.frame object but you could also use the writeData function. I like the writeDataTable function because you can use getTables to extract the last row easier. The output from calling getTables looks like this using one of my workbooks as an example with three tables stacked vertically:

    [1] "A1:P61"   "A62:A63"  "A64:T124"

The trick is then to extract the last row number which in this case is 124. So here is a function I just wrote quickly that will do all of this for you automatically and takes a workbook object, name of the sheet you wish to modify, and the updated summary table saved as a data.frame object:

    Update_wb_fun<-function(wb, sheetname, newdata){

        tmp_wb <- wb #creates copy if you wish to keep the original before modifying as a check

        table_cells <- names(getTables(tmp_wb,sheetname)) #Extracts the Excel cells for all DataTables in the worksheet. First run there will only be one but in subsequent runs there will be more and you will want the last one.

        lastrow <- as.numeric(gsub("[A-z]","",unlist(strsplit(table_cells[length(table_cells)],":"))[2])) #Extracts the last row

        start_time_row <- lastrow+1

        writeDataTable(tmp_wb,sheet=sheetname,x=data.frame(Sys.Date()),startRow = start_time_row) #Appending the time stamp as DatTable.

        writeDataTable(tmp_wb,sheet=sheetname,x=newdata,startRow = start_time_row+2) #Appending the new data under the time stamp

        return(tmp_wb)
    }

The code to extract the "lastrow" looks a little complicated but uses base R functions to extract that row number. The length function extracts the last element from example output above (e.g., "A64:T124"). The strsplit separates the string by colon and we have to unlist this to create a vector and obtain the second element (e.g., "T124"). Finally the gsub removes the letters and keeps only the row number (e.g., "124"). The as.numeric converts from character object to numeric object.

So to call this function and update the "summarized_dat" worksheet do the following:

    test_wb <- Update_wb_fun(wb, "summarized_dat", summary_df) #Here I am preserving the original workbook object before modification but you could save to wb directly.

    saveWorkbook(test_wb, "file_location\\filename.xlsx", overWrite=T) #Need to use the overWrite option if filename already exists.

So that is how you could append to the "summarized_dat" sheet. Before the final save you could also update the first two sheets by re-running the writeDataTable statements earlier before we wrote the function. Let me know if this is at all what you are looking for and I could modify this code easily. Good luck!