How should I best structure my web application using job queues [and Perl/Catalyst]?

1.2k views Asked by At

I'm writing a web application using the Catalyst framework. I'm also using a Job Queue called TheSchwartz.

I'm wanting to use a job queue because I'm wanting as much of the application specific code decoupled from the web application interface code.

Essentially the whole system consists of three main components:

  • GUI (Catalyst web interface)
  • A crawler
  • An "attacking component" (the app is being written to look for XSS and SQLi vulnerabilities in other webapps/sites)

So in theory the GUI creates jobs for the crawler which in turn creates jobs for the "attacking component".

Currently I have a Model in Catalyst which instantiates a TheSchwartz object so that Controllers in the web app can add jobs to the job queue.

I also need to create some job worker scripts that continuously listen (/check the database) for new jobs so they can perform the required actions. Currently the DB specific stuff for TheSchwartz is in the Model in Catalyst and I don't think I can easily access that outside of Catalyst?

I don't want to duplicate the DB connection data for TheSchwartz job queue in the Model and then in my job worker scripts. Should I wrap the creation of TheSchwartz object in another class sitting outside of Catalyst and call that in the Model that is currently instantiating TheSchwartz object? Then I could also use that in the worker scripts. Or should I have the DB data in a config file and instantiate new TheSchwartz objects as and when I need them (inside Catalyst/inside job worker scripts)?

Or am I just over thinking this?

Some links to meaty web app architecture articles may also be useful (I've never built one of moderate complexity before..).

Cheers

3

There are 3 answers

0
hobbs On BEST ANSWER

Are you using DBIx::Class? The basic idea here applies even if you're not, but I'm going to go ahead and assume that you are.

A Catalyst model should be a wrapper for another class, providing just enough behavior to interface with Catalyst, and nothing else. For example Catalyst::Model::DBIC::Schema is just a wrapper for DBIx::Class::Schema. It gets the config from Catalyst and passes it to DBIC, it injects the ResultSets into the Model namespace (so that you can do the $c->model('DB::Table') trick), and then it gets out of the way.

The advantage is that since all of the important code lives outside of Catalyst::Model, it's completely independent of Catalyst. You can load up your Schema from a maintenance script or a jobqueue worker or whatever else, pass it some config, tell it to connect and go, without ever invoking Catalyst. All of the information and logic that's in your ResultSets and whatever else is equally available outside of Catalyst as inside.

1
Julien On

If I understand correct, your question is "how can reuse my database connection outside of Catalyst?".

You should have used DBIx::Class within your Catalyst application. You can reuse the same files in any other application. $c->mode('DB::MyTable')->search(...) in Catalyst is the same as this outside of catalyst:

my $schema = MyApp::Model::DB->new();
$schema->resultset('MyTable')->search(...)

Any Model can be called outside of Catalyst like a regular package MyApp::Model::Library->new(). You just want to make sure you do not use $c as an argument.

0
Mark Fowler On

One of the things you should take a look at is using TheSchwartz::Simple to create jobs rather than TheSchwartz itself (which you really only need in order to process jobs). The advantages are:

  • Lightweight (no need to load the entire of TheSchwartz into your Catalyst App)
  • Accepts a simple database handle to connect to the database, whereas TheSchwartz essentially has it's own database wrapper layer and will want you to give it usernames and passwords and manage its own connection (which you've said you don't want it to do)