Home > Django, Python > Running long processes in Django

Running long processes in Django

My original issue was that I had this piece of code in my Django project that was taking way too long to run, sometimes even leading to a load time out. The particular situation happens very infrequently (at most once a week), so it was just a matter of getting the process to run successfully. Along the way, though, I learned a lot about spawning processes, distributed computing, and the different options we have in the Python community. The different approaches are just ways to get the processing to be done outside the Django process, of course.


cron and queue

cron
I will first start with the recommended process of taking care of this issue. You can setup a cron job to run some code that checks every minute to see if there’s anything to process and then run the appropriate code. Cron jobs are pretty simple to setup and pretty effective. All you need to do is edit the crontab with the time intervals you want the service to be run and your code takes care of the rest.

django-queue-service
The cron part of this solution takes care of when the processing happens, but what handles why it happens? So for that aspect of it you’ll need some way to know when there is processing to be done. There are of course multiple ways to handle this. Update a table in your database, update a file, or a folder… One way is to use django-queue-service. This method requires you to run the queue service as another django instance and then make requests to it. The sample code from the projects page looks as such:

import httplib2
client = httplib2.Http()
body_string = "message=%s" % (message_to_be_inserted_in_queue,)
uri = 'http://dqshost:8000/q/queue_name_here/put'
(response,content) =client.request(uri=uri,
method='POST',
body=body_string)
if response['status'] != 200:
print "Queue didn't accept the input"

While this method does make the most use of Django of all the methods I’ll discuss, I really have problems with it. It’s heavy handed, unnecessary, and I can’t even tell if there are security concerns. Let’s say that someone set this method up improperly and exposed this django instance to the outside world…

queue: Python module
There was some Python module I came across, which I can’t seem to find. When I find it I’ll post a link, but the way it worked is that it was a file based queue. It would add files to folders. And there were five different folders based on the status of the item of the queue. Ready, active, complete… This was a better way to handle the queue than the django-queue in my opinion, but still seemed a bit uncomfortable.


Asynchronous Messaging in Python

The most natural approach, or at least what someone in my ADD generation wants is to get things done on demand. Why wait a minute or ten minutes for the cron job to call my code. Why can’t my code run when I want it to. This approach seems to make more sense to me and the reasons to stay away from it (namely complexity and processing power) go totally out the window (I think) since the other approaches are really not easier than my ending approach (which I’m actually really happy with).

When considering asynchronous communication the natural choices are the following:

Pyro: Very light, very simple to setup. In a word, perfect. Pyro fit the bill for what I was looking for. The thing I liked best about this library was that it’s native to Python so my objects are sent and received as if real Python objects. So there’s no analyzing/manipulating data. Which takes out all the fuss in this type of interraction. Very cool stuff!
XML-RPC: A very strong contendor and something I will probably run into in the near future. XML-RPC is very welcomed in Python and has a couple different implementations. Seems like a vary sane choice when doing this type of messaging.
Twisted: A very dependable project, whose only issue for this task was that it seemed too complicated to setup for the job at hand.
Corba: Corba’s been around for a while, and while it can communicate with almost every language under the sun. I don’t really need that kind of power. Also, since it’s not native to Python there’d be a lot of translation going on.

Pyro
In the end as mentioned above, I went with Pyro. And I am so pleased with the results. And I think you’ll be very surprised by how little code I needed to put together to get this all working if you’ve never looked at Pyro code before.

Important code from my view:

import Pyro.naming, Pyro.core
from Pyro.errors import NamingError
...
try:
locator = Pyro.naming.NameServerLocator()
ns = locator.getNS()
except:
message = "Problem encountered, please try again later"
else:
try:
# resolve Pyro object
URI = ns.resolve('name_of_pyro_object')
pyroObject = Pyro.core.getProxyForURI(URI)
# every method that is one way has to be listed on this statement
pyroObject._setOneway('name_of_pyro_function')
# function call
pyroObject.name_of_pyro_function(input_to_pyro_function)
except NameError, x:
pass # handle error situation

Code from my Pyro file:

import Pyro.naming
import Pyro.core
from Pyro.errors import PyroError,NamingError

from django.core.management import setup_environ
import settings

setup_environ(settings)

from django.core.mail import send_mail

from project.app.file import Func

class pyroClass(Pyro.core.ObjBase):
def name_of_pyro_function(s, input_to_pyro_function):
# handle_processing here
pass

def main():
Pyro.core.initServer()
daemon = Pyro.core.Daemon()
# locate the NS
locator = Pyro.naming.NameServerLocator()
print 'searching for Name Server...'
ns = locator.getNS()
daemon.useNameServer(ns)

# 'name_of_pyro_object' is the name by which our object will be known to the outside world
# connect a new object implementation (first unregister previous one)
try:
ns.unregister('name_of_pyro_object')
except NamingError:
pass
daemon.connect(pyroClass(),'name_of_pyro_object')

# enter the server loop.
daemon.requestLoop()

if __name__=="__main__":
main()

After that you’ll have to run two things along with your Django process:
pyro-ns
python name_of_pyro_file.py

That’s it. How cool is that!

Categories: Django, Python Tags:
  1. February 3rd, 2009 at 08:41 | #1

    Interesting… I wasn’t sure how to solve this for a long running process (reading through lots of data).

    In the end, I made a class BackgroundJob, in __init__ I start a thread, which runs a passed in function.
    The thread goes into a dictionary.

    Different classes extend BackgroundThread, and can do different things + report back their status.

    There is an autorefreshing view that tracks the status.

    I need to generalise the stuff and put some code up on django snippets at some point :)

  2. February 5th, 2009 at 02:56 | #2

    interesting, but in u can use signals for this or standard threading module ;)

  3. Tal
    August 30th, 2009 at 12:38 | #3

    How do I run these along with my Django process:
    pyro-ns
    python name_of_pyro_file.py
    ?

    Thanks,

    Tal.

  4. August 31st, 2009 at 08:33 | #4

    Tal, you need to run both of those processes on their own in the background on your server.

  5. star
    February 19th, 2010 at 01:11 | #5

    Firstly, thank you for this interesting post, but I would like to know if it is possible to set up a Django project with RabbitMQ AND Pyro, I mean using Django to build the entire project (with views, database…), RabbitMQ to encapsulate data in queues and Pyro to synchronize communication as it uses already the Peackle protocole ?

  6. Jan
    February 1st, 2011 at 06:25 | #6

    You should check out too http://code.google.com/p/django-tasks/

  7. Lior sion
    March 6th, 2011 at 04:30 | #7

    Have you looked at gevent and celery?

  8. March 7th, 2011 at 11:57 | #8

    Lior, ya, when I wrote this post, I don’t think Celery wasn’t out yet and I hadn’t heard of greenlet or gevent… Celery looks pretty cool; haven’t got a chance to play with it yet. Gevent seems a bit heavy handed for these purposes…

  1. No trackbacks yet.