1. Tutorial

1.1. Setup

If you don’t have a MySQL server, you’ll need to install and run one . doloop uses fairly basic SQL, and should work on MySQL versions as early as 5.0, if not earlier.

You’ll also want to install a Python MySQL driver, such as PyMySQL.

Next, you’ll want to create at least one doloop table:

create-doloop-table user_loop | mysql -D test  # or a db of your choice

This table is used to keep track of what IDs we care about, and how recently they’ve been updated. You’ll want one table per kind of update on kind of thing.

For example, if you want to separately update users’ profile pages and their friend recommendations, you’d want two tables, named something like user_profile_loop and user_friend_loop.

By default, doloop assumes IDs are INTs, but you can use any column type that can be a primary key. For example, if your IDs are 64-character ASCII strings:

create-doloop-table -i 'CHAR(64) CHARSET ascii' user_loop | mysql -D test

You can also create tables programmatically using doloop.create() and doloop.sql_for_create().

1.2. Adding and removing IDs

Next, you’ll want to make sure the IDs of the things you want to keep updated are in your doloop table. Use doloop.add() to add them:

dbconn = MySQLdb.connection(...)

for user_id in ...: # your function to stream all user IDs
    doloop.add(dbconn, 'user_loop', user_id)

You’ll also want to add a call to doloop.add() to your user creation code. doloop.add() uses INSERT IGNORE, so it’s fine to call it several times for the same ID.

Each call to doloop.add() gets a write lock on user_loop, so it’s much more efficient to add chunks of several IDs at a time:

for list_of_user_ids in ...:
    doloop.add(dbconn, 'user_loop', list_of_user_ids)

If something no longer needs to be updated (e.g. the user closes their account), you can remove the ID with doloop.remove().

1.3. Doing updates

The basic workflow is to use doloop.get() to grab the IDs of the things that have gone the longest without being updated, perform your updates, and then mark them as done with doloop.did():

user_ids = doloop.get(dbconn, 'user_loop', 1000)

for user_id in user_ids:
    ... # run your update logic

doloop.did(dbconn, 'user_loop', user_ids)

A good, low-effort way to set up workers is to write a script that runs in a crontab. It’s perfectly safe (and encouraged) to run several workers concurrently; doloop.get() will lock the IDs it grabs so that other workers don’t try to update the same things.

You should make sure that your update logic can be safely called twice concurrently for the same ID. In fact, it’s totally cool for code that has never called doloop.get() to update arbitrary things and then call did() on their IDs to let the workers know. It’s also a good idea for your update code to gracefully handle nonexistent IDs.

How many workers you want and when they run is up to you. If there turn out not to be enough workers, things will simply be updated less often than you’d like. You can set a limit on how frequently the same ID will be updated using the min_loop_time argument to get(); by default, this is one hour.

Also, don’t worry too much about your workers crashing. By default, IDs are locked for an hour (also configurable, with the lock_for argument to get()), so they’ll eventually get unlocked and fetched by another worker. Conversely, if there is a problem ID that always causes a crash, that problem ID won’t bother your workers for another hour.

You can also explicitly unlock IDs, without marking them as updated, using doloop.unlock().

1.4. Prioritization

So, this is a great system for making sure every user gets updated eventually, but some users are more active than others. You can use doloop.bump() to prioritize certain ID(s):

def user_do_something_noteworthy(user_id):
    ... # your logic for the user doing something noteworthy

    doloop.bump(dbconn, 'user_loop', user_id)

doloop has an elegant (or depending how you look at it, too-magical) rule that IDs which are locked get highest priority once the lock expires. By default, bump() sets the lock to expire immediately, so we get priority without any waiting.

However, in real life, users are likely to do several noteworthy things in one session (well, depending on your users). You can avoid updating the same user several times by setting lock_for. For example, the first time a user does something noteworthy, this code will keep them locked for an hour, after which they’ll be prioritized:

def user_do_something_noteworthy(user_id):
    ...

    doloop.bump(dbconn, 'user_loop', user_id, lock_for=60*60)

If a particularly special user did noteworthy things continuously, they’d still get updated more or less hourly; you can’t repeatedly bump() things into the future.

If for some reason you forgot to add a user, bump() will automatically add them before bumping them (as will did() and unlock()). An alternate way to use doloop is to bump() every time something changes, secure in the knowledge that if you forgot to add a call to bump() somewhere, things will still get updated eventually.

Also, due to doloop‘s elegant/too-magical semantics, you can give ID(s) super-high priority by setting lock_for to a negative number. At a certain point, though, you should just do the update immediately and call did().

1.5. Auditing

If you want to check on a particular ID or set of IDs, for example to see how long it’s gone without being updated, you can use doloop.check().

To check on the status of the task loop as a whole, use doloop.stats(). Among other things, this can tell you how many IDs have gone more than a day/week without being updated.