web2py Scheduled task to recreate (reset) database

367 views Asked by At

I am dealing with a CRON job that places a text file with 9000 lines device names.

The job recreates the file every day with an updated list from a network crawler in our domain.

What I was running into is when I have the following worker running my import into my database the db.[name].id kept growing with this method below

scheduler.py

# -*- coding: utf-8 -*-
from gluon.scheduler import Scheduler
def demo1():
    db(db.asdf.id>0).delete()
    db.commit()
    with open('c:\(project)\devices.list') as f:
        content = f.readlines()
        for line in content:
            db.asdf.insert(asdf = line)
    db.commit()

mysched = Scheduler(db, tasks = dict(demo1 = demo1) )

default.py (initial kickoff)

@auth.requires_membership('!Group-IS_MASTER')
def rgroup():
    mysched.queue_task('demo1',start_time=request.now,stop_time = None,prevent_drift=True,repeats=0,period=86400)

    return 'you are member of a group!'

So the next time the job kicked off it would start at db.[name].id = 9001. So every day the ID number would grow by 9000 or so depending on the crawler's return. It just looked sloppy and I didn't want to run into issues years down the road with database limitations that I don't know about.

(I'm a DB newb (I know, I don't know stuff))

SOOOOOOO.....

This is what I came up with and I don't know if this is the best practice or not. And an issue that I ran into when using db.[name].drop() in the same function that is creating entries is the db tables didn't exist and my job status went to 'FAILED'. So I defined the table in the job. see below:

scheduler.py

from gluon.scheduler import Scheduler
def demo1():
    db.asdf.drop()  #<=====Kill db.asdf
    db.commit()     #<=====Commit Kill
    db.define_table('asdf',Field('asdf'),auth.signature )  #<==== Phoenix Rebirth!!!
    with open('c:\(project)\devices.list') as f:
        content = f.readlines()
        for line in content:
            db.asdf.insert(asdf = line)
    db.commit()     #<=========== Magic

mysched = Scheduler(db, tasks = dict(demo1 = demo1) )

In the line of Phoenix Rebirth in the comments of code above. Is that the best way to achieve my goal?

It starts my ID back at 1 and that's what I want but is that how I should be going about it?

Thanks!

P.S. Forgive my example with windows dir structure as my current non-prod sandbox is my windows workstation. :(

1

There are 1 answers

0
Frédéric On

Why wouldn't you check if the line is present prior to inserting its corresponding record ?

...
with open('c:\(project)\devices.list') as f:
    content = f.readlines()
    for line in content:
        # distinguishing t_ for tables and f_ for fields
        db_matching_entries = db(db.t_asdf.f_asdf==line).select()
        if len(db_matching_entries) == 0:
            db.t_asdf.insert(f_asdf = line)
        else:
            # here you could update your record, just in case ;-)
            pass
db.commit()     #<=========== Magic

Got a similar process that takes few seconds to complete with 2k-3k entries. Yours should not take longer than half a minute.