Running long processes in Django
My original issue was that I had this piece of code in my Django project that was taking way too long to run, sometimes even leading to a load time out. The particular situation happens very infrequently (at most once a week), so it was just a matter of getting the process to run successfully. Along the way, though, I learned a lot about spawning processes, distributed computing, and the different options we have in the Python community. The different approaches are just ways to get the processing to be done outside the Django process, of course.
cron and queue
I will first start with the recommended process of taking care of this issue. You can setup a cron job to run some code that checks every minute to see if there’s anything to process and then run the appropriate code. Cron jobs are pretty simple to setup and pretty effective. All you need to do is edit the crontab with the time intervals you want the service to be run and your code takes care of the rest.
The cron part of this solution takes care of when the processing happens, but what handles why it happens? So for that aspect of it you’ll need some way to know when there is processing to be done. There are of course multiple ways to handle this. Update a table in your database, update a file, or a folder… One way is to use django-queue-service. This method requires you to run the queue service as another django instance and then make requests to it. The sample code from the projects page looks as such:
import httplib2 client = httplib2.Http() body_string = "message=%s" % (message_to_be_inserted_in_queue,) uri = 'http://dqshost:8000/q/queue_name_here/put' (response,content) =client.request(uri=uri, method='POST', body=body_string) if response['status'] != 200: print "Queue didn't accept the input"
While this method does make the most use of Django of all the methods I’ll discuss, I really have problems with it. It’s heavy handed, unnecessary, and I can’t even tell if there are security concerns. Let’s say that someone set this method up improperly and exposed this django instance to the outside world…
queue: Python module
There was some Python module I came across, which I can’t seem to find. When I find it I’ll post a link, but the way it worked is that it was a file based queue. It would add files to folders. And there were five different folders based on the status of the item of the queue. Ready, active, complete… This was a better way to handle the queue than the django-queue in my opinion, but still seemed a bit uncomfortable.
Asynchronous Messaging in Python
The most natural approach, or at least what someone in my ADD generation wants is to get things done on demand. Why wait a minute or ten minutes for the cron job to call my code. Why can’t my code run when I want it to. This approach seems to make more sense to me and the reasons to stay away from it (namely complexity and processing power) go totally out the window (I think) since the other approaches are really not easier than my ending approach (which I’m actually really happy with).
When considering asynchronous communication the natural choices are the following:
Pyro: Very light, very simple to setup. In a word, perfect. Pyro fit the bill for what I was looking for. The thing I liked best about this library was that it’s native to Python so my objects are sent and received as if real Python objects. So there’s no analyzing/manipulating data. Which takes out all the fuss in this type of interraction. Very cool stuff!
XML-RPC: A very strong contendor and something I will probably run into in the near future. XML-RPC is very welcomed in Python and has a couple different implementations. Seems like a vary sane choice when doing this type of messaging.
Twisted: A very dependable project, whose only issue for this task was that it seemed too complicated to setup for the job at hand.
Corba: Corba’s been around for a while, and while it can communicate with almost every language under the sun. I don’t really need that kind of power. Also, since it’s not native to Python there’d be a lot of translation going on.
In the end as mentioned above, I went with Pyro. And I am so pleased with the results. And I think you’ll be very surprised by how little code I needed to put together to get this all working if you’ve never looked at Pyro code before.
Important code from my view:
import Pyro.naming, Pyro.core from Pyro.errors import NamingError ... try: locator = Pyro.naming.NameServerLocator() ns = locator.getNS() except: message = "Problem encountered, please try again later" else: try: # resolve Pyro object URI = ns.resolve('name_of_pyro_object') pyroObject = Pyro.core.getProxyForURI(URI) # every method that is one way has to be listed on this statement pyroObject._setOneway('name_of_pyro_function') # function call pyroObject.name_of_pyro_function(input_to_pyro_function) except NameError, x: pass # handle error situation
Code from my Pyro file:
import Pyro.naming import Pyro.core from Pyro.errors import PyroError,NamingError from django.core.management import setup_environ import settings setup_environ(settings) from django.core.mail import send_mail from project.app.file import Func class pyroClass(Pyro.core.ObjBase): def name_of_pyro_function(s, input_to_pyro_function): # handle_processing here pass def main(): Pyro.core.initServer() daemon = Pyro.core.Daemon() # locate the NS locator = Pyro.naming.NameServerLocator() print 'searching for Name Server...' ns = locator.getNS() daemon.useNameServer(ns) # 'name_of_pyro_object' is the name by which our object will be known to the outside world # connect a new object implementation (first unregister previous one) try: ns.unregister('name_of_pyro_object') except NamingError: pass daemon.connect(pyroClass(),'name_of_pyro_object') # enter the server loop. daemon.requestLoop() if __name__=="__main__": main()
After that you’ll have to run two things along with your Django process:
That’s it. How cool is that!