Django : computationally/data intensive website

90 views Asked by At

I come from the world of object programming (C++, python...) and am currently switching to web programming (django). The websites I went to build are computationally and data intensive (like mechanical simulation, bio-engineering, IA...). I have a hard time understanding what databases are used for in web programming and more generally understanding the architecture of web apps (or say computationally intensive websites). Here is my understanding :

  • Databases are only providing a persistence service (in a very organised and secured way though), they provide computational capabilities through querying but these are not meant for the complex computations examples stated above?
  • If heavy computation is required on the data, the data is first queried by django from the DB then loaded into a python "object" data model, then the computation happens in plain python and sent to the client thanks to django's templates?

Am I right?

Also, why are databases used to store data when it comes to web dev? Why isn't standard persistence solutions (like the methods used in C++ or Python) used for web dev?

Additionally, I expect the data to be loaded at the beginning of a session. Architecturally speaking, were in django should the query of the DB and deserialisation of the data be achieved?


EXTRA DETAILS AND A FINE TUNED QUESTION

I definitely need persistence and the data definitely needs to be read/write as it will intensively be modified by the user.

I can think about 3 computations architectures :

  • Computations performed by the database
  • Server-side computing
  • Client-side computing

If my understanding is right and database computing is not intended for complex computations (like mechanical simulation, bio-engineering, IA...), then I guess I would have to fallback to server-side or client-side computing. Say I choose server-side computing, so I do not have to learn about client-side solutions. In this case, I would actually consider querying all the data from the db in a python objects architecture, displaying the data in the front-end and performing operations in the back-end according to user instructions received from the front and storing again in the database at the end of the session or on user request. Is this good?

2

There are 2 answers

0
Ezon Zhao On

This answer is a little opinionated.

For your 1st question, I would go for the opposite side: leave the computation to the database.

Most databases are optimized for both memory-saving and fast-computing. And Django provides a bunch of tools to help you build the queryset such that most computation can be done on database.

For example, a banking app, if I want to know the current balance for each account of a given client, with following model setup

class Client(models.Model):
    name = models.CharField(max_length=50)

class Account(models.Model):
    client = models.ForeignKey(Client, on_delete=models.CASCADE)

class Journal(models.Model):
    DEBIT, CREDIT = 1, -1
    CATEGORIES = [(DEBIT, 'debit'), (CREDIT, 'credit')]
    account = models.ForeignKey(Account, on_delete=models.CASCADE)
    category = models.IntegerField(choices=CATEGORIES)
    value = models.DecimalField(max_digit=12, decimal_places=2)

Then the view function would be like so

def client_accounts_view(request, client_id):
    client = Account.objects.filter(client__id=client_id)\
        .values('name').annotate(balance=Sum(F('journal__category')*F('journal__value'))
    return render(...)

What if I calculate the balance within python? I could not come up with an elegant solution.

For your 2nd question, I have no idea how standard persistence works, but databases usually handle concurrently requests quite well, have a well-developed backup and restore functionality, and even support hot stand-by servers. I guess you would have a lot more work to do to achieve those features using standard persistence.

For your 3rd question, the moment right before the actual data are accessed, on each request. If you serve your website over http, then each request should be treated separately, since http is stateless. Besides, Django's querysets are lazy. They make it so to minimize database hits and save memory.

5
Stephen C On

Ideally:

  • The data intensive stuff should be done by the database ... to minimize moving (persistent) data that doesn't need to be moved.
  • The computation intensive stuff should be done by the front-end ... to avoid overloading the database. (Assuming that the database is shared by multiple front-end servers.)

In practice:

  • Some applications are both data and computationally intensive.
  • In other applications, the computations are difficult to express as efficient SQL.

So ... depending on the nature of the data vs computational intensiveness of the application ... sometimes the most efficient solution is to do the database and others in the front end. If peak efficiency is your primary design goal, the in-database vs in-memory design choice will be determined by the application.

Web-apps are not different to other kinds of applications in this respect. And Django is no different to other web application frameworks / containers in this respect. Except that Django and other frameworks with ORMs at their base make it easier for the program to not make the design choice at all.


For your application, it seems that you have already concluded that doing the computation in the database is not going to work. But you haven't said (one way or another) whether your application needs a database at all. For instance, if the data will all fit in memory AND it is read only AND the kind of queries / computations you are performing are not a good match for (SQL-like) set logic, then using a database to hold the data seems pointless.

Either way, you could use Django without a Django model. However that will mean that all of the "model-based" UI classes will be unavailable to you. And at that point, I would (personally) question the value of using Django at all.


But the bottom line is that we can't really give an answer that is relevant to your application without (fully) understanding the parameters of your application, and your reasons for choosing Django.

And most of your general "why" questions are best answered with:

It depends on the application.

(And this is one reason why we don't hear much about fifth-generation programming languages (5GLs) anymore.)