Caching a downloaded file based on modified time using dogpile

242 views Asked by At

I'm writing a program that downloads a large file (~150MB) and parses the data into a more useful text format file. The process of downloading, and especially parsing, are slow (~20 minutes in total), so I'd like to cache the result.

The result of downloading are a bunch of files, and the result of parsing is a single file, so I can manually check if these files exist and if so, check their modified time; however, as I'm already using a dogpile with a redis backend for web service calls in other places in the code, I was wondering if dogpile could be used for this?

So my question is: can dogpile be used to cache a file based on its modified time?

1

There are 1 answers

0
Valeriy Solovyov On

Why you don't want divide program to several parts:

  • downloader

  • parser & saver

  • worker with results

You can use cache variable to store there value that you need, which you will update on file update.

   import os
   import threading
   _lock_services=threading.Lock()
   tmp_file="/tmp/txt.json"
   update_time_sec=3300
   with _lock_services:
   # if file was created more the 50min ago
   # here you can check if file was updated and update your cache variable
        if os.path.getctime(tmp_file) < (time.time() - update_time_sec):
            os.system("%s  >%s" %("echo '{}'",tmp_file))

        with open(tmp_file,"r") as json_data:
            cache_variable = json.load(json_data)