Jug allows you to write code that is broken up into tasks and run different tasks on different processors.
It currently has two backends. The first uses the filesystem to communicate between processes and works correctly over NFS, so you can coordinate processes on different machines. The second is based on redis so the processes only need the capability to connect to a common redis server.
Jug also takes care of saving all the intermediate results to the backend in a way that allows them to be retrieved later.
Here is a one minute example. Save the following to a file called primes.py:
from jug import TaskGenerator
from time import sleep
@TaskGenerator
def is_prime(n):
sleep(1.)
for j in xrange(2,n-1):
if (n % j) == 0:
return False
return True
primes100 = map(is_prime, xrange(2,101))
Of course, this is only for didactical purposes, normally you would use a better method. Similarly, the sleep function is so that it does not run too fast.
Now type jug status primes.py to get:
Task name Waiting Ready Finished Running
-------------------------------------------------------------------------
primes.is_prime 0 99 0 0
.........................................................................
Total: 0 99 0 0
This tells you that you have 99 tasks called primes.is_prime ready to run. So run jug execute primes.py &. You can even run multiple instances in the background (if you have multiple cores, for example). After starting 4 instances and waiting a few seconds, you can check the status again (with jug status primes.py):
Task name Waiting Ready Finished Running
-------------------------------------------------------------------------
primes.is_prime 0 63 32 4
.........................................................................
Total: 0 63 32 4
Now you have 32 tasks finished, 4 running, and 63 still ready. Eventually, they will all finish and you can inspect the results with jug shell primes.py. This will give you an ipython shell. The primes100 variable is available, but it is an ugly list of jug.Task objects. To get the actual value, you call the value function:
In [1]: primes100 = value(primes100)
In [2]: primes100[:10]
Out[2]: [True, True, False, True, False, True, False, False, False, True]
You can either get the git repository at
http://github.com/luispedro/jug
Or download the package from PyPI. You can use easy_instal jug or pip install jug if you’d like.
version 0.8: - Tasklets - Fix bugs in sleep-until and cleanup - Fix bugs with CompoundTask (you needed to run jug execute twice before)
version 0.7.4: - Fix case where ~/.jug/configrc does not exist - Print host name to lock file on file_store - Refactored implementation of options - Fix unloading tasks that have not run - Fix mapreduce for empty input
Version 0.7.3: - Parse ~/.jug/configrc - Fix bug with waiting times - Special case saving of numpy arrays - Add more expressive jugdir syntax - Save dict_store backend to disk
Version 0.7.2: - included missing files in the distribution
Version 0.7.1: - sleep-until subcommand - bugfixes
Version 0.7 (starting with 0.6.9 in testing): - barrier() - better shell command
It is a Python only package. I have tested it with Python 2.5 and 2.6. I do not expect Python 2.4 or earlier to work (this is not a priority). Python 3.0 will not work either (this is expected to change in the future—patches are welcome).
Beta (or thereabouts).
This is still in development and APIs are not fixed, but are in less flux than they were earlier in the project and it is very usable.
It is usable, though. I have used it for my academic projects for the past two years and wouldn’t now start any other project without using jug. It’s become a major part of the way I handle projects with a large number of computations and cluster usage.