2. Function reference (“API”)

2.1. Setup

doloop.create(dbconn, table, id_type='int', engine='InnoDB')

Create a task loop table. It has a schema like this:

CREATE TABLE `...` (
    `id` INT NOT NULL,
    `last_updated` INT DEFAULT NULL,
    `lock_until` INT DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `lock_until` (`lock_until`, `last_updated`)
) ENGINE=InnoDB
  • id is the ID of the thing you want to update. It can refer to anything that has a unique ID (doesn’t need to be another table in this database). It also need not be an INT; see id_type, below.
  • last_updated: a unix timestamp; when the thing was last updated, or NULL if it never was
  • lock_until is also a unix timestamp. It’s used to keep workers from grabbing the same IDs, and prioritization. See get() for details.
Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table. Something ending in _loop is recommended.
  • id_type (str) – alternate type for the id field (e.g. 'VARCHAR(64)')
  • engine (str) – alternate storage engine to use (e.g. 'MyISAM')

There is no drop() function because programmatically dropping tables is risky. The relevant SQL is just DROP TABLE `...`.

doloop.sql_for_create(table, id_type='int', engine='InnoDB')

Get SQL used by create().

Useful to power create-doloop-table (included with this package), which you can use to pipe CREATE statements into mysql.

2.2. Adding and removing IDs

doloop.add(dbconn, table, id_or_ids, updated=False, test=False)

Add IDs to this task loop.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • id_or_ids – ID or list of IDs to add
  • updated – Set this to true if these IDs have already been updated; this will last_updated to the current time rather than NULL.
  • test – If True, don’t actually write to the database
Returns:

number of IDs that are new

Runs this query with a write lock on table:

INSERT IGNORE INTO `...` (`id`, `last_updated`)
    VALUES (...), ...

(last_updated is omitted if updated is False.)

doloop.remove(dbconn, table, id_or_ids, test=False)

Remove IDs from this task loop.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • id_or_ids – ID or list of IDs to add
  • test – If True, don’t actually write to the database
Returns:

number of IDs removed

Runs this query with a write lock on table:

DELETE FROM `...` WHERE `id` IN (...)

2.3. Doing updates

doloop.get(dbconn, table, limit, lock_for=3600, min_loop_time=3600, test=False)

Get some IDs of things to update, and lock them.

Generally, after you’ve updated IDs, you’ll want to pass them to did().

The rules for fetching IDs are:

  • First, fetch IDs which are locked but whose locks have expired. Start with the ones that have been locked the longest.
  • Then, fetch unlocked IDs. Start with those that have never been updated, then fetch the ones that have gone the longest without being updated.

Ties (e.g. for newly inserted IDs) are broken arbitrarily by the database.

Note that because IDs whose locks have expired are selected first, the lock_until column can also be used to prioritize IDs; see bump().

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • limit (int) – max number of IDs to fetch
  • lock_for – a conservative upper bound for how long we expect to take to update this ID, in seconds. Default is one hour. Must be positive.
  • min_loop_time – If a job is unlocked, make sure it was last updated at least this many seconds ago, so that we don’t spin on the same IDs.
  • test – If True, don’t actually write to the database
Returns:

list of IDs

Runs this query with a write lock on table:

SELECT `id` FROM `...`
    WHERE `lock_until` <= UNIX_TIMESTAMP()
    ORDER BY `lock_until`, `last_updated`
    LIMIT ...

SELECT `id` FROM `...`
    WHERE `lock_until` IS NULL
    AND (`last_updated` IS NULL
         OR `last_updated` <= UNIX_TIMESTAMP() - ...)
    ORDER BY `last_updated`
    LIMIT ...

UPDATE `...` SET `lock_until` = UNIX_TIMESTAMP() + ...
    WHERE `id` IN (...)
doloop.did(dbconn, table, id_or_ids, auto_add=True, test=False)

Mark IDs as updated and unlock them.

Usually, these will be IDs that you grabbed using get(), but it’s perfectly fine to update arbitrary IDs on your own initiative, and mark them as done.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • id_or_ids – ID or list of IDs that we just updated
  • auto_add (bool) – Add any IDs that are not already in the table.
  • test – If True, don’t actually write to the database
Returns:

number of rows updated (mostly useful as a sanity check)

Runs this query with a write lock on table:

INSERT IGNORE INTO `...` (`id`) VALUES (...), ...

UPDATE `...`
    SET `last_updated` = UNIX_TIMESTAMP(),
        `lock_until` = NULL
    WHERE `id` IN (...)

(INSERT IGNORE is only run if auto_add is True.)

doloop.unlock(dbconn, table, id_or_ids, auto_add=True, test=False)

Unlock IDs without marking them updated.

Useful if you get() IDs, but are then unable or unwilling to update them.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • id_or_ids – ID or list of IDs
  • auto_add (bool) – Add any IDs that are not already in the table.
  • test – If True, don’t actually write to the database
Returns:

Either: number of rows updated/added or number of IDs that correspond to real rows. (MySQL unfortunately returns different row counts for UPDATE statements depending on how connections are configured.) Don’t use this for anything more critical than sanity checks and logging.

Runs this query with a write lock on table:

INSERT IGNORE INTO `...` (`id`) VALUES (...), ...

UPDATE `...` SET `lock_until` = NULL
    WHERE `id` IN (...)

(INSERT IGNORE is only run if auto_add is True)

2.4. Prioritization

doloop.bump(dbconn, table, id_or_ids, lock_for=0, auto_add=True, test=False)

Bump priority of IDs.

Normally we set lock_until to the current time, which gives them priority without actually locking them (see get() for why this works).

You can make IDs super-high-priority by setting lock_for to a negative value. For example, bumping an ID with lock_for=-600 will give it the same priority as an ID that was bumped 600 seconds ago.

You can also lock IDs for a little while, then prioritize them, by setting lock_for to a positive value. This can be useful in situations where you expect IDs might be bumped again in the near future, and you only want to run your update function once.

This function will only ever decrease lock_until; it’s not possible to keep something locked forever by continually bumping it.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • id_or_ids – ID or list of IDs
  • lock_for – Number of seconds that the IDs should stay locked.
  • auto_add (bool) – Add any IDs that are not already in the table.
  • test – If True, don’t actually write to the database
Returns:

number of IDs bumped (mostly useful as a sanity check)

Runs this query with a write lock on table:

INSERT IGNORE INTO `...` (`id`) VALUES (...), ...

UPDATE `...`
    SET `lock_until` = UNIX_TIMESTAMP() + ...
    WHERE (`lock_until` IS NULL OR
           `lock_until` > UNIX_TIMESTAMP() + ...)
          AND `id` IN (...)

(INSERT IGNORE is only run if auto_add is True)

2.5. Auditing

doloop.check(dbconn, table, id_or_ids)

Check the status of particular IDs.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • id_or_ids – ID or list of IDs

Returns a dictionary mapping ID to a tuple of (since_updated, locked_for), that is, the current time minus last_updated, and lock_for minus the current time (both of these in seconds).

This function does not require write access to your database and does not lock tables.

Runs this query:

SELECT `id`,
       UNIX_TIMESTAMP() - `last_updated`,
       `lock_until` - UNIX_TIMESTAMP()`
    FROM `...`
    WHERE `id` IN (...)
doloop.stats(dbconn, table)

Get stats on the performance of the task loop as a whole.

Parameters:
  • dbconn – any DBI-compliant MySQL connection object
  • table (str) – name of your task loop table
  • delay_thresholds – enables the delayed stat; see below

It returns a dictionary containing these keys:

  • bumped: number of IDs where lock_until is now or in the past. (These IDs have the highest priority; see get().)
  • locked: number of IDs where lock_until is in the future
  • min_bump_time/max_bump_time: min/max number of seconds that any ID has been prioritized (lock_until now or in the past)
  • min_id/max_id: min and max IDs (or None if table is empty)
  • min_lock_time/max_lock_time: min/max number of seconds that any ID is locked for
  • min_update_time/max_update_time: min/max number of seconds that an ID has gone since being updated

For convenience and readability, all times will be floating point numbers.

Only min_id and max_id can be None (when the table is empty). Everything else defaults to zero.

This function does not require write access to your database and does not lock tables.

stats() only scans locked/bumped rows and use indexes for everything else, so it should be very fast except in pathological cases. It runs these queries in a single transaction:

SELECT MIN(`id`), MAX(`id`), UNIX_TIMESTAMP() FROM `...`

SELECT MIN(`last_updated`),
       MAX(`last_updated`),
       FROM `...`
       WHERE `lock_until` IS NULL;

SELECT COUNT(*),
       MIN(`last_updated`),
       MAX(`last_updated`),
       MIN(`lock_until`),
       MAX(`lock_until`),
    FROM `...`
    WHERE `lock_until` > ...

SELECT COUNT(*),
       MIN(`last_updated`),
       MAX(`last_updated`),
       MIN(`lock_until`),
       MAX(`lock_until`),
    FROM `...`
    WHERE `lock_until` <= ...