Tutorial

In this tutorial we will demonstrate what goes into creating an application that uses u1db as a backend. We will use code samples from the simple todo list application ‘Cosas’ as our example. The full source code to Cosas can be found in the u1db source tree. It comes with a user interface, but we will only focus on the code that interacts with u1db here.

Defining the Task Object

First we need to define what we’ll actually store in u1db. For a todo list application, it makes sense to have each todo item or task be a single document in the database, so that we can use indexes to find individual tasks with specific properties.

We’ll subclass Document, and define some properties that we think our tasks need to have. There are no schema’s in u1db, which means we can always change the structure of the underlying json document at a later time. (Though that does likely mean we will have to migrate older documents for them to still work with the new code.)

Let’s give our Task objects a title, a (boolean) done property, and a list of tags, so that the json representation of a task would look something like this:

'{"title": "the task at hand",
  "done": false,
  "tags": ["urgent", "priority 1", "today"]}'

We can define Task as follows:

import u1db

class Task(u1db.Document):
    """A todo item."""

    def _get_title(self):
        """Get the task title."""
        return self.content.get('title')

    def _set_title(self, title):
        """Set the task title."""
        self.content['title'] = title

    title = property(_get_title, _set_title, doc="Title of the task.")

    def _get_done(self):
        """Get the status of the task."""
        return self.content.get('done', False)

    def _set_done(self, value):
        """Set the done status."""
        self.content['done'] = value

    done = property(_get_done, _set_done, doc="Done flag.")

    def _get_tags(self):
        """Get tags associated with the task."""
        return self.content.setdefault('tags', [])

    def _set_tags(self, tags):
        """Set tags associated with the task."""
        self.content['tags'] = list(set(tags))

    tags = property(_get_tags, _set_tags, doc="Task tags.")

As you can see, Document objects come with a .content property, which is a Python dictionary. This is where we look up or store all data pertaining to the task.

We can now create tasks, set their titles:

>>> example_task = Task()
>>> example_task.title = "Create a Task class."
>>> example_task.title
'Create a Task class.'

their tags:

>>> example_task.tags
[]
>>> example_task.tags = ['develoment']
>>> example_task.tags
['develoment']

and their done status:

>>> example_task.done
False
>>> example_task.done = True
>>> example_task.done
True

This is all we need the task object to do: as long as we have a way to store all its data in the .content dictionary, the super class will take care of converting that into JSON so it can be stored in the database.

For convenience, we can create a function that returns a fresh copy of the content that would make up an empty task:

EMPTY_TASK = {"title": "", "done": False, "tags": []}

get_empty_task = lambda: copy.deepcopy(EMPTY_TASK)

Defining Indexes

Now that we have tasks defined, we will probably want to query the database using their properties. To that end, we will need to use indexes. Let’s define two for now, one to query by tags, and one to query by done status. We’ll define some global constants with the name and the definition of the indexes, which will make them easier to refer to in the rest of the code:

TAGS_INDEX = 'tags'
DONE_INDEX = 'done'
INDEXES = {
    TAGS_INDEX: ['tags'],
    DONE_INDEX: ['bool(done)'],
}

INDEXES is just a regular dictionary, with the names of the indexes as keys, and the index definitions, which are lists of expressions as values. (We chose to use lists since an index can be defined on multiple fields, though both of the indexes defined above only index a single field.)

The tags index will index any document that has a top level field tags and index its value. Our tasks will have a list value under tags which means that u1db will index each task for each of the values in the list in this index. So a task with the following content:

{
    "title": "Buy sausages and vimto",
    "tags": ["shopping", "food"],
    "done": false
}

Would be indexed under both "food" and "shopping".

The done index will index any document that has a boolean value in a top level field with the name done.

We will see how the indexes are actually created and queried below.

Storing and Retrieving Tasks

To store and retrieve our task objects we’ll need a u1db Database. We can make a little helper function to get a reference to our application’s database, and create it if it doesn’t already exist:

from dirspec.basedir import save_data_path

def get_database():
    """Get the path that the database is stored in."""
    return u1db.open(
        os.path.join(save_data_path("cosas"), "cosas.u1db"), create=True,
        document_factory=Task)

There are a few things to note here: First of all, we use lp:dirspec to handle where to find or put the database in a way that works across platforms. This is not something specific to u1db, so you could choose to use it for your own application or not: u1db.open() will happily take any filesystem path. Secondly, we pass our Task class into the document_factory argument of u1db.open(). This means that any time we get documents from the database, it will return Task objects, so we don’t have to do the conversion in our code.

Now we create a TodoStore class that will handle all interactions with the database:

class TodoStore(object):
    """The todo application backend."""

    def __init__(self, db):
        self.db = db

    def initialize_db(self):
        """Initialize the database."""
        # Ask the database for currently existing indexes.
        db_indexes = dict(self.db.list_indexes())
        # Loop through the indexes we expect to find.
        for name, expression in INDEXES.items():
            if name not in db_indexes:
                # The index does not yet exist.
                self.db.create_index(name, *expression)
                continue
            if expression == db_indexes[name]:
                # The index exists and is up to date.
                continue
            # The index exists but the definition is not what expected, so we
            # delete it and add the proper index expression.
            self.db.delete_index(name)
            self.db.create_index(name, *expression)

The initialize_db() method checks whether the database already has the indexes we defined above and if it doesn’t or if the definition is different than the one we have, the index is (re)created. We will call this method every time we start the application, to make sure all the indexes are up to date. Creating an index is a matter of calling create_index() with a name and the expressions that define the index. This will immediately index all documents already in the database, and afterwards any that are added or updated.

def get_all_tags(self):
    """Get all tags in use in the entire database."""
    return [key[0] for key in self.db.get_index_keys(TAGS_INDEX)]

The get_index_keys() method gets a list of all indexed values from an index. In this case it will give us a list of all tags that have been used in the database, which can be useful if we want to present them in the user interface of our application.

def get_tasks_by_tags(self, tags):
    """Get all tasks that have every tag in tags."""
    if not tags:
        # No tags specified, so return all tasks.
        return self.get_all_tasks()
    # Get all tasks for the first tag.
    results = dict(
        (doc.doc_id, doc) for doc in
        self.db.get_from_index(TAGS_INDEX, tags[0]))
    # Now loop over the rest of the tags (if any) and remove from the
    # results any document that does not have that particular tag.
    for tag in tags[1:]:
        # Get the ids of all documents with this tag.
        ids = [
            doc.doc_id for doc in self.db.get_from_index(TAGS_INDEX, tag)]
        for key in results.keys():
            if key not in ids:
                # Remove the document from result, because it does not have
                # this particular tag.
                del results[key]
                if not results:
                    # If results is empty, we're done: there are no
                    # documents with all tags.
                    return []
    return results.values()

This method gives us a way to query the database by a set of tags. We loop through the tags one by one and then filter out any documents that don’t have that particular tag.

def get_task(self, doc_id):
    """Get a task from the database."""
    task = self.db.get_doc(doc_id)
    if task is None:
        # No document with that id exists in the database.
        raise KeyError("No task with id '%s'." % (doc_id,))
    if task.is_tombstone():
        # The document id exists, but the document's content was previously
        # deleted.
        raise KeyError("Task with id %s was deleted." % (doc_id,))
    return task

get_task is a thin wrapper around get_doc() that takes care of raising appropriate exceptions when a document does not exist or has been deleted. (Deleted documents leave a ‘tombstone’ behind, which is necessary to make sure that synchronisation of the database with other replicas does the right thing.)

def new_task(self, title=None, tags=None):
    """Create a new task document."""
    if tags is None:
        tags = []
    # We make a fresh copy of a pristine task with no title.
    content = get_empty_task()
    # If we were passed a title or tags, or both, we set them in the object
    # before storing it in the database.
    if title or tags:
        content['title'] = title
        content['tags'] = tags
    # Store the document in the database. Since we did not set a document
    # id, the database will store it as a new document, and generate
    # a valid id.
    return self.db.create_doc(content)

Here we use the convenience function defined above to initialize the content, and then set the properties that were passed into new_task. We call create_doc() to create a new document from the content. This creates the document in the database, assigns it a new unique id (unless we pass one in,) and returns a fully initialized Task object. (Since we made that the database’s factory.)

def get_all_tasks(self):
    return self.db.get_from_index(DONE_INDEX, "*")

Since the DONE_INDEX indexes anything that has a value in the field “done”, and all tasks do (either True or False), it’s a good way to get all tasks out of the database, especially since it will sort them by done status, so we’ll get all the active tasks first.

Synchronisation and Conflicts

Synchronisation has to be initiated by the application, either periodically, while it’s running, or by having the user initiate it. Any u1db.Database can be synchronised with any other, either by file path or URL. Cosas gives the user the choice between manually synchronising or having it happen automatically, every 30 minutes, for as long as it is running.

from ubuntuone.platform.credentials import CredentialsManagementTool

    def get_ubuntuone_credentials(self):
        cmt = CredentialsManagementTool()
        return cmt.find_credentials()

    def _synchronize(self, creds=None):
        target = self.sync_target
        assert target.startswith('http://') or target.startswith('https://')
        if creds is not None:  # convert into expected form
            creds = {'oauth': {
                'token_key': creds['token'],
                'token_secret': creds['token_secret'],
                'consumer_key': creds['consumer_key'],
                'consumer_secret': creds['consumer_secret']
                }}
        self.store.db.sync(target, creds=creds)
        # refresh the UI to show changed or new tasks
        self.refresh_filter()

    def synchronize(self, finalize):
        if self.sync_target == 'https://u1db.one.ubuntu.com/~/cosas':
            d = self.get_ubuntuone_credentials()
            d.addCallback(self._synchronize)
            d.addCallback(finalize)
        else:
            self._synchronize()
            finalize()

When synchronising over http(s), servers can (and usually will) require OAuth authentication. The code above shows how to acquire and pass in the oauth credentials for the Ubuntu One server, in case you want your application to synchronize with that.

After synchronising with another replica, it is possible that one or more conflicts have arisen, if both replicas independently made changes to the same document. Your application should probably check for conflicts after every synchronisation, and offer the user a way to resolve them.

Look at the Conflicts class in cosas/ui.py to see an example of how this could be presented to the user. The idea is that you show the conflicting versions to the user, let them pick one, and then call resolve_doc() with the preferred version, and all the revisions of the conflicting versions it is meant to resolve.

def resolve(self, doc, revs):
    self.store.db.resolve_doc(doc, revs)
    # refresh the UI to show the resolved version
    self.refresh_filter()

Full Cosas Documentation and Source Code

cosas example application.

class cosas.cosas.Task(doc_id=None, rev=None, json='{}', has_conflicts=False)[source]

A todo item.

done

Done flag.

tags

Task tags.

title

Title of the task.

class cosas.cosas.TodoStore(db)[source]

The todo application backend.

delete_task(task)[source]

Delete a task from the database.

get_all_tags()[source]

Get all tags in use in the entire database.

get_task(doc_id)[source]

Get a task from the database.

get_tasks_by_tags(tags)[source]

Get all tasks that have every tag in tags.

initialize_db()[source]

Initialize the database.

new_task(title=None, tags=None)[source]

Create a new task document.

save_task(task)[source]

Save task to the database.

tag_task(task, tags)[source]

Set the tags of a task.

cosas.cosas.extract_tags(text)[source]

Extract the tags from the text.

cosas.cosas.get_database()[source]

Get the path that the database is stored in.

User interface for the cosas example application.

class cosas.ui.Main(in_memory=False)[source]

Main window of our application.

add_tag(doc_id, tag)[source]

Create a link between the task with id doc_id and the tag, and add a new button for tag if it was not already there.

add_task(task)[source]

Add a new todo item.

connect_events()[source]

Hook up all the signal handlers.

delete()[source]

Delete a todo item.

get_tag_color()[source]

Get a color number to use for a new tag.

hide_buttons()[source]

Show the frame with the tag buttons.

refresh_filter()[source]

Remove all tasks, and show only those that satisfy the new filter.

remove_tag(doc_id, tag)[source]

Remove the link between the task with id doc_id and the tag, and remove the button for tag if it no longer has any tasks associated with it.

show_buttons()[source]

Show the frame with the tag buttons.

update()[source]

Either add a new task or update an existing one.

update_tags(item, old_tags, new_tags)[source]

Process any changed tags for this item.

class cosas.ui.UITask(task, parent, store, font, main_window)[source]

Task list item.

update_task_text(text)[source]

Edit an existing todo item.