Save Google App Script state while parsing an object array and continue where left off later on

299 views Asked by At

I am using this simple google app script to parse through all available Google Sites and dump the html content of individual pages. There are quite many pages so the script will eventually run into 6 minute time limit.

Is it possible to somehow use the PropertiesService to save the current progress (especially in the array loops) and continue where left off later on?

    var sites = SitesApp.getAllSites("somedomain.com");
    var exportFolder = DriveApp.getFolderById("a4342asd1242424folderid-");
            
            // Cycle through all sites
        for (var i in sites){
              var SiteName = sites[i].getName();
              var pages = sites[i].getAllDescendants();
              // Create folder in Drive for each site name
              var siteFolder = exportFolder.createFolder(SiteName)
        
        for (var p in pages){
                // Get page name and url
                var PageUrl = pages[p].getUrl();
             
               
                //Dump the raw html content in the text file
                var htmlDump = pages[p].getHtmlContent();
                siteFolder.createFile(PageUrl+".html", htmlDump)
                
              }
        }

I can image how one can use the Properties Service to store current line number in the Spreadsheet, and continute where left off. But how can this be done with array containing objects like Sites or Pages?

2

There are 2 answers

0
iansedano On BEST ANSWER

Using Objects with Properties Service

According to the quotas the maximum size of something you can store in the properties service is 9kb. With a total of 500kb. So if your object is less than this size, it should be no problem. That said, you will need to convert the object to a string with JSON.stringify() and when you retrieve it, use JSON.parse.

Working around the run time limit

What is commonly done to work around the limit is to structure a process around the properties service and triggers. Essentially you make the script keep track of time, and if it starts to take a long time, you get it to save its position and then create a trigger so that the script runs again in 10 seconds (or however long you want), for example:

function mainJob(x) {
  
  let timeStart = new Date()
  console.log("Starting at ", timeStart)
  
  for (let i = x; i < 500000000; i++){ // NOTE THE i = x
    
    // MAIN JOB INSTRUCTIONS
    let j = i
    // ...
  
    // Check Time
    let timeCheck = new Date()
    if (timeCheck.getTime() - timeStart.getTime() > 30000) {
      console.log("Time limit reached, i = ", i)
      
      // Store iteration number
      PropertiesService
          .getScriptProperties()
          .setProperty('PROGRESS', i)
      
      console.log("stored value of i")
      
      // Create trigger to run in 10 seconds.
      ScriptApp.newTrigger("jobContinue")
          .timeBased()
          .after(10000)
          .create()
      
      console.log("Trigger created for 10 seconds from now")
      return 0
    }
  }
  
  // Reset progress counter
  PropertiesService
          .getScriptProperties()
          .setProperty('PROGRESS', 0)
  
  console.log("job complete")
}

function jobContinue() {
  
  console.log("Restarting job")
  
  previousTrigger = ScriptApp.getProjectTriggers()[0]
  ScriptApp.deleteTrigger(previousTrigger)
  console.log("Previous trigger deleted")
  
  triggersRemain = ScriptApp.getProjectTriggers()
  console.log("project triggers", triggersRemain)
  
  let progress = PropertiesService
                   .getScriptProperties()
                   .getProperty('PROGRESS')
  
  console.log("about to start main job again at i = ", progress)
  
  mainJob(progress)
  
}

function startJob() {
  mainJob(0)
}

Explanation

  • This script only has a for loop with 500 million iterations in which it assigns i to j, it is just an example of a long job that potentially goes over the run time limit.
  • The script is started by calling function startJob which calls mainJob(0).
  • Within mainJob
    • It starts by creating a Date object to get the start time of the mainJob.
    • It takes the argument 0 and uses it to initialize the for loop to 0 as you would normally initialise a for loop.
    • At the end of every iteration, it creates a new Date object to compare with the one created at the beginning of mainJob. In the example, it is set to see if the script has been running for 30 seconds, this can obviously be extended but keep it well below the limit.
    • If it has taken more than 30 seconds, it stores the value of i in the properties service and then creates a trigger to run jobContinue in 10 seconds.
  • After 10 seconds, the function jobContinue calls the properties service for the value for i, and calls mainJob with the value returned from the properties service.
  • jobContinue also deletes the trigger it just created to keep things clean.
  • This script should run as-is in a new project, try it out! When I run it, it takes around 80 seconds, so it runs the first time, creates a trigger, runs again, creates a trigger, runs again and then finally finishes the for loop.

References

0
Karan On

If you are able to process all pages of 1 site under 6 minutes then you could try saving the site names first in a sheet or props depending on the number again. And keep processing n-sites per run. Can also try SitesApp.getAllSites(domain, start, max) and save start value in props after incrementing.

Can do something similar for pages if you cannot process them under 6 minutes. SitesApp.getAllDescendants(options)