1. Tutorial

This library is meant to be used for analysis of activity data recorded by telemetry collars for wildlife, but at its core it deals with time series, including loading data from files, providing a number of ways to visualize it and to perform transformations to reduce the usually massive amounts of data to smaller units that are easier to handle. The built-in set of charts is limited to special ones, specific to biological applications, while the common ones like line charts are supposed to be plotted with matplotlib, the standard Python plotting library for scientific purposes. If you have never used it before, learning to work with it will be time well spent if you intend to work in a scientific environment and use Python.

In this document we will see a couple of examples of how to do common tasks.

1.1. Loading data

The only data source available for now are CSV files. To open an activity dataset from a Vectronic Aerospace collar converted with GPS Plus to text, you could do this:

import timbre.readdata as rd

dataset = rd.CSVReader('data.txt', format='vas')

The input files can be in any (well structured) format instead of the few known to CSVReader. To load the same file without using the pre-defined format, we have to manually specify which columns we want to read in and how to convert them into correct types:

import timbre.readdata as rd

# define columns
columns = {'date': 'UTC_Date',
           'time': 'UTC_Time',
           'x': 'ActivityX',
           'y': 'ActivityY',
           'temp': 'Temp'}

# define type conversions
def coltypes(row):
    day, mon, year = row['date'].split('.')
    hour, min_, sec = row['time'].split(':')
    # convert values from string to int
    year, mon, day = int(year), int(mon), int(day)
    hour, min_, sec = int(hour), int(min_), int(sec)

    result = {'time': datetime(year, mon, day, hour, min_, sec),
              'x': int(row['x']),
              'y': int(row['y']),
              'temp': int(row['temp'])}

    return result

dataset = rd.CSVReader('data.txt', columns=columns, coltypes=coltypes, delimiter=' ', skiplines=1)

For more information please see CSVReader.

1.2. Charts

Actograms are useful to get a quick overview of a dataset, since they present the complete content of it as well as gaps, and to pick a range of dates for further analysis. Here is how to create an actogram():

from timbre.plot import actogram
import timbre.readdata as rd

dataset = rd.CSVReader('data.txt', format='vas')
actogram(dataset, 'x')

dataset is an instance of e.g. CSVReader, as shown in the previous example, and 'x' is the measurement variable from the dataset you want to plot. Other charts (e.g. activity_dist(), spectrogram()) follow the same parameter pattern. A dataset and a measurement variable are always required, but there are also some more parameters that the charts have in common that can be omitted, as well as others specific to a chart. The complete documentation can be found under plot.

1.3. Statistics

Datasets are internally lists of samples, each consisting of a timestamp (date and time of observation) and one or more measurement variables. To perform statistical calculations with them you can use the functions provided in the module analysis. The first step is always to convert the dataset into a new form which is expected by the functions in analysis. This is done with extract():

import timbre.readdata as rd
import timbre.analysis as an
from datetime import datetime

dataset = rd.CSVReader('data.txt', format='vas')
data = an.extract(dataset, 'x', start=datetime(2008, 1, 1), end=datetime(2009, 1, 1))

Calling extract() with start and date parameters restricts the range of selected values to the specified range. The resulting list data contains tuples (pairs of values) with the first value representing the x coordinate and the second value a list with one or more values that either are or can be used to derive the y coordinate and related values (e.g. a mean and its variance or confidence interval).

Since working with activity data usually involves either looking at mean daily distributions or series of values each representing a single whole day (like means of daily activity over a year), there are two functions built into analysis: by_date() and by_time().

by_date() pools all samples of each date in the specified date range together, after which you can reduce them to one value by calculating their mean and plotting the results as a line chart:

import timbre.readdata as rd
import timbre.analysis as an
from timbre.stats import mean_ci
import matplotlib.pyplot as plt

# load dataset
dataset = rd.CSVReader('data.txt', format='vas')
# subdivide into days
data = an.by_date(dataset, 'x')
# replace sample lists for each day by their means and confidence intervals
data = an.recode(data, fy=lambda y: mean_ci(y, 0.05))
# separate x, y and confidence interval values into lists
x, (y, ci_y) = an.unpack(data)

# plot results with matplotlib
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.errorbar(x, y, ci_y)
plt.show()

The mean and its confidence interval are calculated using mean_ci() from the module stats, which takes a list and an alpha value and returns the mean and half the range of the confidence interval.

Calculating a mean daily distribution of activity can be done in a similar way, but since we are interested in the time of the samples instead of their date as in the example above, we will use by_time():

import timbre.readdata as rd
import timbre.analysis as an
from timbre.stats import mean_ci
import matplotlib.pyplot as plt
from datetime import datetime

# load dataset
dataset = rd.CSVReader('data.txt', format='vas')
# extract one month and aggregate by time
data = an.by_time(dataset, 'x', start=datetime(2008, 4, 1), end=datetime(2008, 5, 1))
# subdivide further into bins of one hour each, then calculate means and CIs
data = an.quant_time(data, size=3600, fy=lambda y: mean_ci(y, 0.05))
# separate x, y and confidence interval values into lists
x, (y, ci_y) = an.unpack(data)
# x is a list of timedelta values, convert them to dates to make matplotlib label the x axis correctly
x = [datetime(2000, 1, 1) + xi for xi in x]

# plot results with matplotlib
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.errorbar(x, y, ci_y)
plt.show()

For further information about recode(), unpack() and related functions please see the documentation of analysis.

Table Of Contents

Previous topic

Timbre v1.0 documentation

Next topic

2. readdata — Data input from files

This Page