dendropy.dataobject.char – Character Data

This module handles the core definition of phylogenetic character data.

class dendropy.dataobject.char.CharacterDataCell(value=None, character_type=None)

A container for that holds the value for a particular cell in a matrix.

The attributes of CharacterDataCell are:
‘value’ = an instnance of a StateAlphabetElement ‘character_type’ isa CharacterType or None
class dendropy.dataobject.char.CharacterDataMap

An annotable dictionary with Taxon objects as keys and CharacterDataVectors objects as values.

extend(other_map, overwrite_existing=False, extend_existing=False)

Extends this matrix by adding taxa and characters from the given matrix to this one. If overwrite_existing is True and a taxon in the other matrix is already present in the current one, then the sequence associated with the taxon in the second matrix replaces the sequence in the current one. If extend_existing is True and a taxon in the other matrix is already present in the current one, then the squence associated with the taxon in the second matrix will be added to the sequence in the current one. If both are True, then an exception is raised. If neither are True, and a taxon in the other matrix is already present in the current one, then the sequence is ignored. Note that the containing CharacterMatrix taxa has to be normalized after this operation.

extend_characters(other_map)

Extends this matrix by adding characters from sequences of taxa in given matrix to sequences of taxa with correspond labels in this one. Taxa in the second matrix that do not exist in the current one are ignored.

vector_size

Returns number of characters in first sequence

class dendropy.dataobject.char.CharacterDataVector(*args, **kwargs)

A list of character data values for a taxon – a row of a Character Matrix.

The CharacterDataVector typically contains elements that are instances of CharacterDataCell

set_cell_by_index(column_index, cell)

Sets the cell of a cell at a particular position.

class dendropy.dataobject.char.CharacterMatrix(*args, **kwargs)

Character data container/manager manager.

__init__ calls TaxonSetLinked.__init__ for handling of oid, label and taxon_set keyword arguments.

Can be initialized with:

  • source keyword arguments (see Readable.process_source_kwargs), or
  • a single unnamed CharacterMatrix instance (which will be deep-copied).
add_character_subset(char_subset)

Adds a CharacterSubset object. Raises an error if one already exists with the same label.

clear()

Deletes all items from the character map dictionary.

clone_from(*args)

TODO: may need to check that we are not overwriting oid

classmethod concatenate(char_matrices)

Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonSet reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.

classmethod concatenate_from_paths(paths, schema, **kwargs)

Read a character matrix from each file path given in paths, assuming data format/schema schema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.

classmethod concatenate_from_streams(streams, schema, **kwargs)

Read a character matrix from each file object given in streams, assuming data format/schema schema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.

create_taxon_to_state_set_map(char_indices=None)

Returns a dictionary that maps taxon objects to lists of sets of state indices if char_indices is not None it should be a iterable collection of character indices to include.

description(depth=1, indent=0, itemize='', output=None)

Returns description of object, up to level depth.

export_character_indices(indices)

Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in indices. Note that this new matrix will still reference the same taxon set.

export_character_subset(character_subset)

Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset, character_subset. Note that this new matrix will still reference the same taxon set.

extend(other_matrix, overwrite_existing=False, extend_existing=False)

Extends this matrix by adding taxa and characters from the given matrix to this one. If overwrite_existing is True and a taxon in the other matrix is already present in the current one, then the sequence associated with the taxon in the second matrix replaces the sequence in the current one. If extend_existing is True and a taxon in the other matrix is already present in the current one, then the sequence associated with the taxon in the second matrix will be added to the sequence in the current one. If both are True, then an exception is raised. If neither are True, and a taxon in the other matrix is already present in the current one, then the sequence is ignored.

extend_characters(other_matrix)

Extends this matrix by adding characters from sequences of taxa in given matrix to sequences of taxa with correspond labels in this one. Taxa in the second matrix that do not exist in the current one are ignored.

extend_map(other_map, overwrite_existing=False, extend_existing=False)

Extends this matrix by adding taxa and characters from the given map to this one. If overwrite_existing is True and a taxon in the other map is already present in the current one, then the sequence associated with the taxon in the second map replaces the sequence in the current one. If extend_existing is True and a taxon in the other matrix is already present in the current one, then the squence map with the taxon in the second map will be added to the sequence in the current one. If both are True, then an exception is raised. If neither are True, and a taxon in the other map is already present in the current one, then the sequence is ignored.

get(key, def_val=None)

Gets an item from character map by its key, returning default if key not present.

has_key(key)

Returns true if character map has key, regardless of case.

id_chartype_map()

Returns dictionary of element id to corresponding character definition.

items()

Returns character map key, value pairs in key-order.

iteritems()

Returns an iterator over character map’s values.

iterkeys()

Dictionary interface implementation for direct access to character map.

itervalues()

Dictionary interface implementation for direct access to character map.

keys()

Returns a copy of the ordered list of character map keys.

new_character_subset(label, character_indices)

Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.

pop(k[, x]): a[k] if k in a, else x (and remove k)
popitem()

a.popitem() remove and last (key, value) pair

prune_taxa(taxa, update_taxon_set=False)

Removes given taxa from matrix. If preserve_taxon_set is True, then the taxa are removed from the associated TaxonSet object as well. Otherwise this is not modified (default).

read(stream, schema, **kwargs)

Populates objects of this type from schema-formatted data in the file-like object source stream, replacing all current data. If multiple character matrices are in the data source, a 0-based index of the character matrix to use can be specified using the matrix_offset keyword (defaults to 0, i.e., first character matrix).

reindex_subcomponent_taxa()

Synchronizes Taxon objects of map to taxon_set of self.

setdefault(key, def_val=None)

Sets the default value to return if key not present.

update_taxon_set()

Updates local taxa block by adding taxa not already managed. Mainly for use after map extension

values()

Returns list of values.

vector_size

Returns number of characters in first sequence

vectors()

Returns list of vectors.

write(stream, schema, **kwargs)

Writes out this object’s data to a file-like object opened for writing stream.

class dendropy.dataobject.char.CharacterSubset(*args, **kwargs)

Tracks definition of a subset of characters.

Keyword arguments:

  • label: name of this subset

  • character_indices: list of 0-based (integer) indices

    of column positions that constitute this subset.

class dendropy.dataobject.char.CharacterType(*args, **kwargs)

A character format or type of a particular column: i.e., maps a particular set of character state definitions to a column in a character matrix.

class dendropy.dataobject.char.ContinuousCharacterMatrix(*args, **kwargs)

Character data container/manager manager.

See CharacterMatrix.__init__ documentation

class dendropy.dataobject.char.DiscreteCharacterMatrix(*args, **kwargs)

Character data container/manager manager.

That adds the attributes self.state_alphabets (a list of alphabets) and self.default_state_alphabet

See CharacterMatrix.__init__ documentation for kwargs.

Unnamed args are passed to clone_from.

class dendropy.dataobject.char.DnaCharacterMatrix(*args, **kwargs)

DNA nucleotide data.

See CharacterMatrix.__init__ documentation for kwargs.

Unnamed args are passed to clone_from.

class dendropy.dataobject.char.InfiniteSitesCharacterMatrix(*args, **kwargs)

Infinite sites data.

See CharacterMatrix.__init__ documentation for kwargs.

Unnamed args are passed to clone_from.

class dendropy.dataobject.char.NucleotideCharacterMatrix(*args, **kwargs)

Generic nucleotide data.

Inits. Handles keyword arguments: oid, label and taxon_set.

class dendropy.dataobject.char.ProteinCharacterMatrix(*args, **kwargs)

Protein / amino acid data.

Inits. Handles keyword arguments: oid, label and taxon_set.

class dendropy.dataobject.char.RestrictionSitesCharacterMatrix(*args, **kwargs)

Restriction sites data.

See CharacterMatrix.__init__ documentation for kwargs.

Unnamed args are passed to clone_from.

class dendropy.dataobject.char.RnaCharacterMatrix(*args, **kwargs)

RNA nucleotide data.

See CharacterMatrix.__init__ documentation for kwargs.

Unnamed args are passed to clone_from.

class dendropy.dataobject.char.SitePatterns(matrix=None)

Tracks distinct site patterns in a character matrix. Useful for efficient computations.

class dendropy.dataobject.char.StandardCharacterMatrix(*args, **kwargs)

standard data.

See CharacterMatrix.__init__ documentation for kwargs.

Unnamed args are passed to clone_from.

extend(other_matrix, overwrite_existing=False, extend_existing=False)

Extends this matrix by adding taxa and characters from the given matrix to this one. If overwrite_existing is True and a taxon in the other matrix is already present in the current one, then the sequence associated with the taxon in the second matrix replaces the sequence in the current one. If extend_existing is True and a taxon in the other matrix is already present in the current one, then the sequence associated with the taxon in the second matrix will be added to the sequence in the current one. If both are True, then an exception is raised. If neither are True, and a taxon in the other matrix is already present in the current one, then the sequence is ignored.

class dendropy.dataobject.char.StateAlphabet(*args, **kwargs)

A list of states available for a particular character type/format.

ambiguous_states()

Returns list of ambiguous states of this alphabet

fundamental_states()

Returns list of fundamental states of this alphabet

get_state(attr_name, value)

Returns state in self in which attr_name equals value.

get_states(oids=None, symbols=None, tokens=None)

Returns list of states with ids/symbols/tokens equal to values given in a list of ids/symbols/tokens (exact matches, one-to-one correspondence between state and attribute value in list).

get_states_as_cells(oids=None, symbols=None, tokens=None)

Returns (plain) list of CharacterDataCell objects with values set to states corresponding to symbols given by symbols.

get_states_as_vector(oids=None, symbols=None, tokens=None, **kwargs)

Returns CharacterDataVector object, with member CharacterDataCell objects with values set to states corresponding to symbols given by symbols. If taxon is given in keyword arguments, its value will be assigned to the taxon property of the CharacterDataVector.

id_state_map()

Returns dictionary of element id’s to state objects.

is_gap_state(el)

Returns True if the Alphabet has an element designated as the gap “state” and el is this element.

match_state(oids=None, symbols=None, tokens=None)

Returns SINGLE state that has ids/symbols/tokens as member states.

multi_states()

Returns list of multistate states of this alphabet

polymorphic_states()

Returns list of ambiguous states of this alphabet

state_for_symbol(symbol)

Returns a StateAlphabetElement object corresponding to given symbol.

state_index_for_symbol(symbol)

Returns index of the StateAlphabetElement object corresponding to the given symbol.

symbol_state_map()

Returns dictionary with symbols as keys and StateAlphabetElement objects as values.

class dendropy.dataobject.char.StateAlphabetElement(oid=None, label=None, symbol=None, token=None, multistate=0, member_states=None)

A character state definition, which can either be a fundamental state or a mapping to a set of other character states (for polymorphic or ambiguous characters).

fundamental_ids

Returns set of id’s of all _get_fundamental states to which this state maps.

fundamental_states

Returns value of self in terms of a set of _get_fundamental states (i.e., set of single states) that correspond to this state.

fundamental_symbols

Returns set of symbols of all _get_fundamental states to which this state maps.

fundamental_tokens

Returns set of tokens of all _get_fundamental states to which this state maps.

Previous topic

dendropy.dataobject.tree – Tree Data

Next topic

dendropy.coalescent – Coalescent Calculations and Statistics

Documentation

Obtaining

AnnouncementsGoogle Groups

Join the "DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-announce" group, or click here to visit this group page directly.

DiscussionGoogle Groups

Join the "DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.