Iraniblog

Home > Django, Python > Revisiting long running processes in Django

Revisiting long running processes in Django

March 2nd, 2009 michael Leave a comment Go to comments

So in my last post on this topic I discussed different ways to handle the problem of running code that wold otherwise cripple your web server, by pushing them off to another process.

It was mentioned that there may be other ways to attack the problem, either through Python threads or Django’s signals. This post is a review of those two suggestions.

Django’s signals

I’ll attack the simpler option first, since the response to whether this solution is viable takes all of one word: no. Signals are not asynchronous by nature and therefore they do not work as a solution to the problem. It is true that you can setup signals to work asynchronously, but you’d have to employ one of the methods discussed to handle that. So if you were to use signals to handle some asynchronous work then all they would really be doing is changing the layout of that code. You’d still need something else to actually handle the asynchronous work.

I’ll offer up a simple example of how signal code looks just for completeness.

There are three pieces to signals.
1.The signal itself
2.The sending of the signal
3.The listeners of the signal

1. Setting up the signal is as simple as including a signals.py file in your application.

import django.dispatch
test_sig = django.dispatch.Signal()

2. All you need to do to send out the request to all the listeners is call â€œsendâ€ off the signal. In the following code snippet I’m calling the send from one of the view functions.

from django.shortcuts import render_to_response
from django.template import RequestContext
from my_project.my_app import signals
def home(request):
    signals.test_sig.send(sender=None)
    return render_to_response('home.html',
                              context_instance=RequestContext(request),)

3. To setup a listener you call the connect function on the signal and to have link the listener to an instance method or function. Beware: the connect call has to come after the function definition. This code can really sit anywhere within your project.

def signal_func(*args, **kwargs):
    print "Do something here"
test_sig.connect(receiver=signal_func)

Python’s threads

(Update: See Malcom’s comments below, which kind of nullify the positive things I wrote about threads and this problem.)

Threads are actually a very fitting way to handle this scenario. Of course the thought of using threads did cross my mind when first attacking this problem, but it’s the kind of thing I’ve ignored as an option because I know you can get yourself in trouble when you start fiddling with an animal like this, but at the same time they are very powerful and useful if used properly.

It’s really a very simple way to resolve this problem, just push off the work to another thread within the same Python application. And in Python making use of threads is actually quite simple.

Here’s a simple intro into Python threads. There are two levels of thread libraries. The lower level object is called ‘thread’ and the higher level threading module is call ‘threading’. In most cases you will probably be working with the ‘threading’ module.

Another thing to know about is the GIL (Global interpreter lock). Since Python is an interpreted language and threads seem to all be working at the same time, there has to be a way to insure that they are not both using the same objects at the same time. The GIL locks the interpreter so that each thread can safely use Python’s internal data structures. The lock keeps moving between threads pretty frequently (I think it’s something like every 100 bytes).

Important functions to know about from threading:
__init__: initializes thread
start: starts the thread
run: code that actually runs when thread is activated
join: when called waits for thread to finish before continuing

A very simple example:

from django.shortcuts import render_to_response
from django.template import RequestContext
import threading

class TestThread(threading.Thread):
    def run(self):
        print "%s starts" % (self.getName(),)
        import time
        time.sleep(5)
        print "%s ends" % (self.getName(),)

def threadView(request):
    testThread = TestThread()
    testThread.start()
#    testThread.join() # if you remove the first pound sign on this line the becomes synchronous.
    print "Prints right when requested just to show that the other thread is off on it's own"
    
    return render_to_response('test.html',
                              context_instance=RequestContext(request),)

Categories: Django, Python Tags:

Comments (7) Trackbacks (0) Leave a comment Trackback

Malcolm Tredinnick

March 2nd, 2009 at 14:20 | #1

Reply | Quote

Have to disagree. Threads are not a good solution to this problem. The issue is process management. As written, your threads will never be rejoined. Webserver processes have a lifecycle uncontrollable by you (the MaxRequestsPerChild Apache parameter and similar things in other servers) and you are messing with that by using threads.

If you need a process with a lifecycle that is not matched by the request-response path — something long running and independent of the response — a completely separate process is definitely the right model to use. Using a thread is tying it to the response lifecycle, which wil have unintended side-effects.
Malcolm Tredinnick

March 2nd, 2009 at 14:28 | #2

Reply | Quote

I should clarify something in the previous comment: the fact that threads will never be rejoined is separate from the lifecycle management (since forcibly killing the Python process will kill the threadlets as well). But it’s two pieces of uncleanliness.
michael

March 2nd, 2009 at 14:41 | #3

Reply | Quote

Exact type of problems that steer me away from using threads in my projects. There’s just so much to know about before really using them properly.
JÃ¶kull

March 3rd, 2009 at 02:10 | #4

Reply | Quote

I’ve used signals and threads together for things like sending off emails. I would recommend jumping straight to a proper worker queue system like beanstalkd. You get prioritization and you can work through the queue at a set pace which uses the same amount of resources. If you’re queue is growing faster then your worker can eat through it you just throw another worker at it (with more resources).
michael

March 3rd, 2009 at 10:49 | #5

Reply | Quote

Jokull, I discussed using a queue in the first post on this topic (http://iraniweb.com/blog/?p=56). Never came across beanstalkd though, seems pretty interesting, thanks.
Igor G.

April 14th, 2010 at 09:55 | #6

Reply | Quote

michael :
Exact type of problems that steer me away from using threads in my projects. There’s just so much to know about before really using them properly.

Michael, I agree!
Edgar

March 18th, 2012 at 06:26 | #7

Reply | Quote

I tried your threaded method but the site stills hangs until the process is finish. Could it be that I’m running “manage.py runserver” instead of mod_wsgi?