Purpose and OverviewΒΆ

CKAN is a metadata registry for datasets. These datasets can be organised into groups for curation purposes. Often these goups have criteria that must be met in order for datasets to be included. This tool is intended to help automate checking the metadata to see if a particular dataset should be included.

The design of this tool is to take a set of rules, expressed in Notation 3 and to compare the dataset description against these rules. The output is an RDF graph containing statements generated by the rules. Inclusion in a group depends on contents of this graph.

An example may make this clearer. Suppose we consider the LOD Cloud. Inclusion in the cloud means inclusion in the lodcloud group on CKAN. Amongst the criteria for inclusion are that a dataset must contain dereferenceable resources. So we make a simple ruleset:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix void: <http://rdfs.org/ns/void#>.
@prefix curate: <http://eris.okfn.org/ww/2010/12/curate#>.

{ ?dataset void:exampleResource ?example } =>
{ ?example curate:httpReq "HEAD" }.

This says that if something, the ?dataset has an example resource, ?example then we evaluate the special predicate curate:httpReq (of which more later). This and other simple rulesets can be found in the examples directory in the source distribution.

The result of using the tool to evaluate the record for DBpedia produces a graph that contains a transcript of the HTTP session, including headers and status codes and suchlike.

The command line used to do this was:

curate -v \
    -r https://bitbucket.org/okfn/curate/raw/tip/examples/dereference.n3 \
    dbpedia

A more advanced example can do things other than simply infer triples. We can actually have consequent actions using the builtins. One thing that is important to remember is that we cannot use builtins in the head (consequent) rules and the head must have at least one inferred triple or action.

Continuing with an example, suppose the criteria for membership in a group, such as the curation testing group is simply to have a dereferenceable example resource. So we may make a ruleset that looks like this:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix void: <http://rdfs.org/ns/void#>.
@prefix http: <http://www.w3.org/2006/http#>.
@prefix curate: <http://eris.okfn.org/ww/2010/12/curate#>.

{ ?dataset void:exampleResource ?example } =>
{ ?example curate:httpReq "HEAD" }.

{ ?dataset void:exampleResource ?example .
  ?req a http:Request .
  ?req http:requestURI ?uri .
  ?req http:resp ?resp .
  ?resp http:statusCodeNumber "200" } =>
{ ?dataset curate:addGroup <http://ckan.net/group/curation_testing> }.

This example contains two rules that will chain together. The first rule is identical to the previous example. The second rule looks at the result of the request and checks that the status code was “200”. In this case it calls the curate:addGroup action to add the dataset to the specified group. This action calls curate.actions.addGroup which will queue a call to the CKAN API that adds the given dataset to the group in question.

The command line for executing this one, because it does an authenticated write operation on the CKAN catalogue, needs an extra parameter to specify the API key to use, and needs to be told explicitly to save the metadata back via the API.

curate -v -s \
    -k API_KEY \
    -r https://bitbucket.org/okfn/curate/raw/tip/examples/groups.n3 \
    dbpedia

For more complete documentation on the options and behaviours, see the documentation for the command line tool in this manual.

Previous topic

CKAN Curation Tool

Next topic

Installation

This Page