Representor object

class ibidas.representor.Representor

Representor is the primary object in Ibidas. It represents a data set, accesible through slices.

Properties can be accessed to obtain information about this object:

  • Names: Slice names
  • Type: Data type
  • Slices: List of slices
  • I: Info on slices/types/dims without executing the query
  • Depth: Maximum number of dimensions in slices

Slices can be accessed as attributes, e.g: obj.slicename

Note that all slice names should follow the python syntax rules for variable names, AND use only lower case letters (to distinguish them from method names, which start all with an uppercase letter).

Special attribute access can be obtained through so-called axis specifiers:

  • Bookmarks: obj.Bbookmarkname Access set of slices with certain bookmark (see Bookmark method)

  • Dimensions: obj.Ddimname Access all slices with a certain dimension

  • Elements: obj.E[dimname] Access Elements of packed arrays. Optional dimname specifies which dimensions to unpack.

    Slices without that dimension as outermost dimension are not unpacked.

  • Fields: obj.Ffieldname Access Fields of packed tuples. obj should have only one slice.

  • Left/Right: obj.L, obj.R Special nested bookmarks set by e.g. Match operation to allow

    backtracking to separate sources. eg:

    >>> ((x |Match| y) |Match| z).LR  
    

    gives all slices of y (first go Left (get xy), then R (get y)).

Representor objects can be created from python data objects using the ‘Rep’ function, e.g.:

>>> Rep([('a',3),('b',4)])
AddSlice(name, data, dtype=None)
All(dim=None)
Any(dim=None)
Argmax(dim=None)
Argmin(dim=None)
Argsort(dim=None, descend=False)
Argunique(dim=None)
Array(tolevel=None)

Packages dimension into array type

Bookmark(*names, **kwds)

Bookmarks slices with a name. Slices can later be accessed using attribute access, with axis indicator “B”.

Example:
>>> x = x.Bookmark("myslices")
>>> x.myslices  
>>> x.Bmyslices   #only necessary to use the 'B' prefix in case there is also a slice named 'myslices'
Cast(*newtypes, **kwds)

Cast data to new type.

Allowed formats:

  • single type for all slices

    >>> x.Cast("int32")
    
  • type for each slice

    >>> x.Cast(("int64","int8"))
    
  • type for named slices

    >>> x.Cast(your_slice="int8")
    
Contains(elems)
Copy(log=False, debug=False)

Executes the current query.

Normally, query operations (e.g. obj + 3) are not executed immediately. Instead these operations are performed simultaneously when output is requested. This allows us to optimize these operations all together, or e.g. translate them into a SQL query.

However, this behaviour is not always what is needed, e.g:

>>> x = very expensive query
>>> print x[10:20]
>>> print x[10:30]

would execute the query saved in x two times (as ibidas due to being part of an interpreted language cannot analyze the whole script to determine that the output is required twice).

To prevent this, one can instead do:

>>> x = (very expensive query).Copy()

executing the expensive part of the only query once.

Parameters:
  • log – Setting this to true will print the amount of time that is spent in any of the passes of they query optimizer (default: False)
  • debug – Setting this to true will output the query tree before optimization and after processing, through XML-RPC for visualization in Cytoscape. This requires that Cytoscape is running, with an activated XML-RPC plugin listening at port 9000.
Count()
CumSum(dim=None)
Depth

Returns max dimension depth (number of dimensins) of slices in this representor.

Detect(*args, **kwargs)

Detects types of slices, and casts result to this type

Parameters:
  • only_unknown – Only detect slices with unknown types [default: False]
  • allow_convert – Converts also types (e.g. bytes to integers/floats where possible) [default: True]
Dict(with_missing=False)

Combines slices into a tuple type

Difference(other, slices='COMMONPOS', dims=-1L, mode='dim')

Difference compares dataset A and B, given only rows that occur not in both. For further documentation, see ‘Intersect’

DimRename(*names, **kwds)

Rename dimensions. Similar to Rename for slices.

When using a tuple to supply names for the dimensions, we keep the ordering as given by the DimUnique parameter of a representor object. For clarity, it might be better to supply a dictionary with a name mapping however.

Shortcut: use % operator

Dims

Returns dims, ordered according to the order shown below a dataset printout.

Note that dimensions that occur multiple times in the same slice will be repeated (if this is not what is needed, use DimsUnique).

DimsUnique

Returns list of unique dims, ordered as used by DimRename.

Each(eachfunc, named_params=False, dtype=?, **kwargs)

Applies ‘eachfunc’ to each element in this representor. :param eachfunc: can be any (self-defined) python (lambda) function, or a context operation (e.g. _ + 3). :param dtype: expected output type. If not given, this type is automatically detected if necessary for subsequent operations (which is slower).

Elem(name=None)

Unpacks array type into dimension

Elems(name=None)

Unpacks array type into dimension

Except(other, slices='COMMONPOS', dims=-1L, mode='dim')

Except compares dataset A and B, given only rows from A that occur not in B. For further documentation, see ‘Intersect’

Fields(name=None)

Unpacks tuple type into slices

Filter(condition, dim=-1L, mode=None)

Performs filtering on this dataset using condition.

Parameters:
  • condition

    condition to filter on

    • condition should have only a single slice.

    Various data types can be used:

    • Bool: last dim of condition should be equal to a dim in this representor. Filtering occurs on the matching dim.
    • Integer: selects element from a dimension (see below how this is specified). Collapses the dimension it is applied on.
    • Array (of integers): selects positions from a dimension indicated by integers in array.
    • Slice: selects slice of elements from dimension (note that we refer here to the Python slice object, e.g. slice(0,3), not the Ibidas slice concept).
  • dim

    Dim to apply the filtering on.

    • If no dim given, filtering is performed on the last common dimension of the slices (except for bool types, where the dimension of the condition specifies the filtered dimension).
    • Integer: identifies dimension according to dim order (printed at the end of a representor printout)
    • Long: identifies dimension according to common dimensions shared by all slices (default: -1L)
    • String: dimension name
    • Dim object: x.Slices[0].dims[2]
    • Dimpath object: e.g. x.Slices[0].dims[:2]
  • mode

    Determines broadcasting method.

    • “pos” Postion-based broadcasting (‘numpy’-like broadcasting), not based on dimension identity.
    • “dim” Dimension-identity based broadcasting (normal ‘ibidas’ broadcasting)
    • None: (Default). Determined based on input. Non-representation objects use position-based broadcasting
      Representation objects by default use dimension-based, except if they are prepended by a ‘+’ operator, e.g:
      >>> x.Filter(+conditionrep)
      

What is done to dimensions in the conditions that are not in the data source? Here we follow the default rules in Ibidas for broadcasting.

  • First, the dimension in the source that is going to be filtered is identified (see previous sections)
  • Secondly, we match this dimension to the last dimension in the condition.
  • All remaining dimensions are broadcasted against each other.

The examples use the following dataset:

>>> x = Rep([[1,2,3],[4,5,6]]) 
Slices: | data     
-------------------
Type:   | int64    
Dims:   | d1:2<d2:3
Data:   |          
        | [1 2 3]  
        | [4 5 6]  

Dim order: d1:2<d2:3
  • Example: integer filtering

    Filtering the first element:

    >>> x.Filter(0) 
    Slices: | data
    ---------------
    Type:   | int64
    Dims:   | d1:2
    Data:   |
            | 1
            | 4
    
    Dim order: d1:2
    

    This example matches the last common dimension (d2), and selects the first element. This collapses dimension d2.

    Note that if no special keywords are required, one can also use brackets to specify the filter operation:

    >>> x[0]
    

    is equivalent to the previous filtering operation.

    Using the Filter command, we can however also specify that we want to filter a specific dimension:

    >>> x.Filter(0, dim='d1')
    Slices: | data 
    ---------------
    Type:   | int64
    Dims:   | d2:3 
    Data:   |      
            | 1    
            | 2    
            | 3    
    
    Dim order: d2:3
    

    One can also use positional indices for the dimension (according to Dim order in the printout):

    >>> x.Filter(0, dim=0)
    
  • Example: Boolean filtering

    Filtering on boolean constraints:

    >>> x.Filter(_ > 2)
    Slices: | data      
    --------------------
    Type:   | int64     
    Dims:   | d1:2<fd2:~
    Data:   |           
            | [3]       
            | [4 5 6]   
    
    Dim order: d1:2<fd2:~
    

    Here, the _ operator refers to the enclosing scope, i.e. ‘x’. Equivalent is:

    >>> x[_ > 2]
    
  • Example: Slice filtering

    One can also filter using Python slices:

    >>> x.Filter(slice(0,2))
    
    Slices: | data      
    --------------------
    Type:   | int64     
    Dims:   | d1:2<fd2:*
    Data:   |           
            | [1 2]     
            | [4 5]     
    
    Dim order: d1:2<fd2:*
    

    Note that this is equivalent to:

    >>> x[0:2]
    

    (here we do not have to explicitly construct the slice object, as python accepts for this the x:y syntax. Unfortunately, this syntax is not allowed outside brackets).

  • Integer filtering (with arrays)

    Filtering on array:

    >>> x.Filter([0,1])
    Slices: | data
    ---------------
    Type:   | int64
    Dims:   | d1:2
    Data:   |
            | 1
            | 5
    
    Dim order: d1:2
    

    This is maybe not what most expected. Note that the filtering is applied on dimension ‘d2’. The dimension of the [0,1] array is mapped to dimension ‘d1’. Thus, from the first position along ‘d1’ (first row), we select the 0th element from dim d2, and from the second position along ‘d1’, we select the 1th element along dim d2.

    We used here positional broadcasting, as the input was not an Ibidas object. That is, the dimension of [0,1] was mapped to the dimension ‘d1’, even though these do not have the same identity. We can however also specify that we want to do identity based broadcasting:

    >>> x.Filter([0,1],mode='dim')
    
    Slices: | data            
    --------------------------
    Type:   | int64           
    Dims:   | d1:2<d3:2
    Data:   |                 
            | [1 2]           
            | [4 5]           
    
    Dim order: d1:2<d3:2
    

    This applies the [0,1] array as filter on the d2 dimension, transforming it into dimension d3.

    What actually happens is slightly more complicated however: the integers in the [0,1] list are mapped as filters to dimension ‘d2’. This filtering is however done for each element in the [0,1] list, which has dimension ‘d3’. As this dimension is not equal to dimension ‘d1’, it is broadcasted: virtually, the dataset is converted into one with dimensionds d1:2<d3:2<d2:2. Upon applying the filter, dimension ‘d2’ collapses, resulting in a dataset with dimension ‘d1:2<d3:2’. Of course, in practice, we optimize this broadcasting step away.

    Such broadcasting can also happen when using position-based broadcasting, e.g.:

    >>> x.Filter([[0,1],[0,2]])
    
    Slices: | data
    -------------------
    Type:   | int64
    Dims:   | d3:2<d1:2
    Data:   |
            | [1 5]
            | [1 6]
    
    Dim order: d3:2<d1:2
    

    First, we do the same positional broadcasting, filtering dimension ‘d2’, and mapping the second dimension of [[0,1],[0,2]] to dimension ‘d1’. But then we are left with the extra first dimension of [[0,1],[0,2]], which is called ‘d3’. This dimension is broadcasted. As ‘d1’ is already mapped to, the dimension is put in front of ‘d1’.

    We can make this quite complicated, e.g.:

    x.Filter([[0,1],[0,2,1]],mode='dim')
    
    Slices: | data           
    -------------------------
    Type:   | int64          
    Dims:   | d1:2<d4:2<d3:~ 
    Data:   |                
            | [[1 2] [1 3 2]]
            | [[4 5] [4 6 5]]
    
    Dim order: d1:2<d4:2<d3:~

    or:

    x.Filter([[0,1],[0,1,1]],mode='dim',dim='d1')
    
    Slices: | data
    ---------------------------------------
    Type:   | int64
    Dims:   | d4:2<d3:~<d2:3
    Data:   |
            | [[1 2 3];  [4 5 6]]
            | [[1 2 3];  [4 5 6];  [4 5 6]]
    
    Dim order: d4:2<d3:~<d2:3
Flat(dim=-1, name=None)

Flattens (merges) a dimension with previous(parent) dim.

Parameters:
  • dim – Dim to flatten. By default, last dim.
  • name – Name of merged dim. By default, merged name of two dimensions.
FlatAll(name=None)

Flattens all dimensions into one dimension.

Parameters:name – name of new merged dimension. By default, merged names of all dimensions.

Note that this operation is slightly different from flat, in that it converts all slices to have 1 dimension, even those which have 0 dimensions.

Example:

>>> x = Rep(([[1,2,3],[4,5,6]],'a'))

Slices: | f0        | f1
------------------------------
Type:   | int64     | bytes[1]
Dims:   | d1:2<d2:3 |
Data:   |           |
        | [1 2 3]   | a
        | [4 5 6]   |

Dim order: d1:2<d2:3

>>> x.Flat()
         
Slices: | f0      | f1
----------------------------
Type:   | int64   | bytes[1]
Dims:   | d1_d2:6 |
Data:   |         |
        | 1       | a
        | 2       |
        | 3       |
        | 4       |
        | 5       |
        | 6       |

Dim order: d1_d2:6

>>> x.FlatAll()

Slices: | f0      | f1      
----------------------------
Type:   | int64   | bytes[1]
Dims:   | d1_d2:6 | d1_d2:6 
Data:   |         |         
        | 1       | a       
        | 2       | a       
        | 3       | a       
        | 4       | a       
        | 5       | a       
        | 6       | a       

Dim order: d1_d2:6
Get(*slices, **kwds)

Select slices in a new representor, combine with other slices.

Parameters:
  • slices

    Can be various formats:

    • str: selects slice with this name.
      Special symbols:
      • “*”: all slices
      • “~”: all slices with names not yet in previous get parameters are selected.
      • “#”: select first slice if all slices have a common dimension.
    • int: selects slice with this index
    • slice: selects slices with these indexes (note: a Python slice object, not an Ibidas slice)
    • representor: adds this representor slices
    • context: apply to self, adds resulting slices
    • tuple: applies get to elements in tuple, creates tuple slice from resulting slices (see .tuple() function)
    • list: selected slices within list are packed using array function
  • kwds – Same as slices, but also assigns slice names.

Examples:

  • str

    >>> a.Get("f0","f3")
    
  • int

    >>> a.Get(0, 3)
    
  • slice

    >>> a.Get(slice(0,3))
    
  • representor

    >>> a.Get(a.f0 + 3, a.f3 == "gene3",  Rep(3))
    
  • context

    >>> a.perform_some_operation().Get(_.f0 + 3, _.f3 == "gene3")
    
  • tuple

    >>> a.Get((_.f0, _.f1), _.f3)
    
  • list

    >>> a.Get([_.f0])
    
GetRepeatedSliceNames()
GroupBy(*args, **kwargs)

Groups data on the content of one or more slices.

Parameters:flat – Allows one to indicate which slices should not be grouped.

Example:

>>> x = Rep(([1,1,2,2,3,3,4,4],[1,2,1,2,1,2,1,2],[1,2,3,4,1,2,3,4]))

Slices: | f0    | f1    | f2
-------------------------------
Type:   | int64 | int64 | int64
Dims:   | d1:8  | d1:8  | d1:8
Data:   |       |       |
        | 1     | 1     | 1
        | 1     | 2     | 2
        | 2     | 1     | 3
        | 2     | 2     | 4
        | 3     | 1     | 1
        | 3     | 2     | 2
        | 4     | 1     | 3
        | 4     | 2     | 4

Dim order: d1:8

>>> x.GroupBy(_.f0)

Slices: | f0    | f1          | f2
-------------------------------------------
Type:   | int64 | int64       | int64
Dims:   | gf0:* | gf0:*<gd1:~ | gf0:*<gd1:~
Data:   |       |             |
        | 1     | [1 2]       | [1 2]
        | 2     | [1 2]       | [3 4]
        | 3     | [1 2]       | [1 2]
        | 4     | [1 2]       | [3 4]

Dim order: gf0:*<gd1:~            

Note how slice f0 has now only unique values, and how slices f1 and f2 have now two dimensions, grouped per unique value in f0. One can also group on multiple slices at once:

>>> x.GroupBy((_.f1, _.f2))

Slices: | f0            | f1            | f2           
-------------------------------------------------------
Type:   | int64         | int64         | int64        
Dims:   | gdata:*<gd1:~ | gdata:*<gd1:~ | gdata:*<gd1:~
Data:   |               |               |              
        | [2 4]         | [2 2]         | [4 4]        
        | [2 4]         | [1 1]         | [3 3]        
        | [1 3]         | [1 1]         | [1 1]        
        | [1 3]         | [2 2]         | [2 2]        

Dim order: gdata:*<gd1:~

This groups the data such that the combination of f1 and f2 is unique. This actually equivalent to:

>>> x.GroupBy(_.Get(_.f1, _.f2).Tuple())

That is, ‘(_.f1, _.f2)’ signifies that one wants to get the tuple of f1 and f2, which looks like this:

>>> x.Get((_.f1, _.f2))

Slices: | data                
------------------------------
Type:   | (f1=int64, f2=int64)
Dims:   | d1:8                
Data:   |                     
        | (1, 1)              
        | (2, 2)              
        | (1, 3)              
        | (2, 4)              
        | (1, 1)              
        | (2, 2)              
        | (1, 3)              
        | (2, 4)              

Dim order: d1:8

Note that, as one does not directly group on slices f1 or f2, these slices are also grouped.

Instead of grouping on combinations of slices, one can also group on multiple slices individually:

>>> x.GroupBy(_.f0, _.f1)

Slices: | f0    | f1    | f2               
-------------------------------------------
Type:   | int64 | int64 | int64            
Dims:   | gf0:* | gf1:* | gf0:*<gf1:*<gd1:~
Data:   |       |       |                  
        | 1     | 1     | [[1] [2]]        
        | 2     | 2     | [[3] [4]]        
        | 3     |       | [[1] [2]]        
        | 4     |       | [[3] [4]]        

Dim order: gf0:*<gf1:*<gd1:~

Note that f0 and f1 now have two separate dimensions, while f2 has both these dimensions, and an extra ‘group’ dimension (like before). In this case, the gd1 dimensions is always of length 1, as there are only unique values in f2 for every pair of f0, f1.

Of course, one can remove such an extra dim using filtering, e.g.:

>>> x.GroupBy(_.f0, _.f1)[...,0]

Slices: | f0    | f1    | f2
-------------------------------------
Type:   | int64 | int64 | int64
Dims:   | gf0:* | gf1:* | gf0:*<gf1:*
Data:   |       |       |
        | 1     | 1     | [1 2]
        | 2     | 2     | [3 4]
        | 3     |       | [1 2]
        | 4     |       | [3 4]

Dim order: gf0:*<gf1:*

However, it is better to already indicate to groupby that some slices do not have to be grouped, using the ‘flat’ parameter. We already saw a case before where grouping of certain slices was not necessary, namely the one where we grouped on the tuple of f1 and f2:

>>> x.GroupBy((_.f1, _.f2))

We can prevent te grouping of f1 and f2 using flat:

>>> x.GroupBy((_.f1, _.f2),flat=['f1','f2'])

Slices: | f0            | f1      | f2     
-------------------------------------------
Type:   | int64         | int64   | int64  
Dims:   | gdata:*<gd1:~ | gdata:* | gdata:*
Data:   |               |         |        
        | [2 4]         | 2       | 4      
        | [2 4]         | 1       | 3      
        | [1 3]         | 1       | 1      
        | [1 3]         | 2       | 2      

In the case of the multi-dimensional group, we can do the same:

>>> x.GroupBy(_.f0, _.f1, flat='f2') 

Slices: | f0    | f1    | f2         
-------------------------------------
Type:   | int64 | int64 | int64      
Dims:   | gf0:* | gf1:* | gf0:*<gf1:*
Data:   |       |       |            
        | 1     | 1     | [1 2]      
        | 2     | 2     | [3 4]      
        | 3     |       | [1 2]      
        | 4     |       | [3 4]      

Dim order: gf0:*<gf1:*

Note that f2 is now non-unique along every dimension.

However, one might also have a case in which a slice is non-unique for just a single slice in a multi-dimensional group, e.g.:

>>> x.Get(_.f0, _.f1, _.f2, _.f1/'f3').GroupBy(_.f0, _.f1, flat=['f2','f3'])        

Here, we copied slice f1, calling it ‘f3’. Next, we specified that it should be flat. But note that this slice is still unique along dimension gf0... For these situations, a more advanced format for the flat parameter can be used, in which one can specify w.r.t. to which slices a slice should be grouped:

>>> x.Get(_.f0, _.f1, _.f2, _.f1/'f3').GroupBy(_.f0, _.f1, flat={('f0','f1'):'f2','f1':'f3'})

Slices: | f0    | f1    | f2          | f3
---------------------------------------------
Type:   | int64 | int64 | int64       | int64
Dims:   | gf0:* | gf1:* | gf0:*<gf1:* | gf1:*
Data:   |       |       |             |
        | 1     | 1     | [1 2]       | 1
        | 2     | 2     | [3 4]       | 2
        | 3     |       | [1 2]       |
        | 4     |       | [3 4]       |

This specifies that f2 should be flat, while keeping the group for both grouping slices, and ‘f3’ should be flat, while keeping the group only for the second group slice. It is equivalent to:

>>> x.Get(_.f0, _.f1, _.f2, _.f1/'f3').GroupBy(_.f0, _.f1, flat={(0,1):'f2',1:'f3'})        
Harray(name=None)

Combines slices into array.

Example:

    >>> x = Rep([(1,2),(3,4),(5,6)])
    
    Slices: | f0    | f1
    -----------------------
    Type:   | int64 | int64
    Dims:   | d1:3  | d1:3
    Data:   |       |
            | 1     | 2
            | 3     | 4
            | 5     | 6

    Dim order: d1:3

    >>> x.HArray()
    
    Slices: | f0_f1
    -------------------
    Type:   | int64
    Dims:   | d1:3<d2:2
    Data:   |
            | [1 2]
            | [3 4]
            | [5 6]

    Dim order: d1:3<d2:2 

Which is equivalent to this::

    >>> x.Get(HArray(_.f0, _.f1)) 
HasPattern(value, ignore_case=False)
I
In(arrays)
IndexDict()

Combines slices into a tuple type

Info
Intersect(other, slices='COMMONPOS', dims=-1L, mode='dim')

Intersect compares dataset A and B, given only rows from A that occur also in B.

Parameters:
  • other – Other dataset to compare with
  • slices – Specify on which slices an intersection should be performed. COMMON_POS (pair slices with common position), COMMON_NAME (pair slices with common names) or a tuple with for each source a (tuple of) slice name(s). Default: COMMON_POS.
  • dims – Specify across which dimensions an intersection should be performed. Default: last common dim (-1L) in both datasets. Can also be a tuple, allowing one to specify the dim for each source separately.
  • mode – ‘dim’ or ‘pos’. Type of broadcasting (dimension identity or positional) for the dimensions that are not used in the intersection. Default: ‘dim’
IsMissing()

Checks if elements in a slice are missing

Join(other, cond)

Join allows you to take the cartesian product of two dimensions, and filter them on some condition.

Example:

>>> x = Rep([1,2,3,4,5])
>>> y = Rep([3,4,5,6,7,8])

>>> x.Join(y, x >= y)

Slices: | data       | data      
---------------------------------
Type:   | int64      | int64     
Dims:   | d1a_fd1b:* | d1a_fd1b:*
Data:   |            |           
        | 3          | 3         
        | 4          | 3         
        | 4          | 4         
        | 5          | 3         
        | 5          | 4         
        | 5          | 5         

Dim order: d1a_fd1b:*

The join operation generates all possible pairs of values out of x and y, and then filters them on x >= y. Note that you can also use the following equivalent forms:

>>> Join(x, y, x >= y)

>>> x |Join(x >= y)| y

One can also make this condition more complex, eg:

>>> x = Rep([[1,2,3],[4,5,6]])

Slices: | data     
-------------------
Type:   | int64    
Dims:   | d1:2<d2:3
Data:   |          
        | [1 2 3]  
        | [4 5 6]  

Dim order: d1:2<d2:3

>>> x.Join(y, x <= y)

Slices: | data                                  | data                                 
---------------------------------------------------------------------------------------
Type:   | int64                                 | int64                                
Dims:   | d1:2<d2_fd1a:~                        | d1b:2<d2_fd1a:~                      
Data:   |                                       |                                      
        | [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3] | [3 4 5 6 7 8 3 4 5 6 7 8 3 4 5 6 7 8]
        | [4 4 4 4 4 5 5 5 5 6 6 6]             | [4 5 6 7 8 5 6 7 8 6 7 8]            

Dim order: d1:2<d2_fd1a:~

>>> x.Join(y, x.Sum() <= y * y)

Slices: | data            | data
--------------------------------------
Type:   | int64           | int64
Dims:   | d1b_fd1a:*<d2:3 | d1b_fd1a:*
Data:   |                 |
        | [1 2 3]         | 3
        | [1 2 3]         | 4
        | [1 2 3]         | 5
        | [1 2 3]         | 6
        | [1 2 3]         | 7
        | [1 2 3]         | 8
        | [4 5 6]         | 4
        | [4 5 6]         | 5
        | [4 5 6]         | 6
        | [4 5 6]         | 7
        | [4 5 6]         | 8

>>> x.Join(y, y |In| x)

Slices: | data            | data      
--------------------------------------
Type:   | int64           | int64     
Dims:   | d1b_fd1a:*<d2:3 | d1b_fd1a:*
Data:   |                 |           
        | [1 2 3]         | 3         
        | [4 5 6]         | 4         
        | [4 5 6]         | 5         
        | [4 5 6]         | 6         

Dim order: d1b_fd1a:*<d2:3

The use of context operators is a bit more complex with Join operations, as the context can refer to both sources. The context operator therefore refers to the combination of both. If there is a conflict in slice names (like in the previous examples), one can refer to both slices using the ‘L’ and ‘R’ bookmark (see Combine operation):

>>> x.Join(y, _.L == _.R)

Slices: | data           | data           
------------------------------------------
Type:   | int64          | int64          
Dims:   | d1:2<d2_fd1a:~ | d1b:2<d2_fd1a:~
Data:   |                |                
        | [3]            | [3]            
        | [4 5 6]        | [4 5 6]        

Dim order: d1:2<d2_fd1a:~
Level(tolevel)

Bring all slices to same dimension height throug packing and broadcasting

Like(value, ignore_case=False)
Log()
Log10()
Log2()
Lower()
Match(other, lslice=None, rslice=None, jointype='inner', merge_same=False, mode='dim')

Match allows you to take the cartesian product of two dimensions, and filter them on an equality condtion.

Parameters:
  • other – ibidas representor to match with
  • lslice – slice in self to perform equality condition on (see .Get for allowed parameter values). Default: use a slice which has same name in both sources (there should be only 1 slice pair with this property).
  • rslice – slice in other to perform equality condition on (see .Get for allowed parameter values). Default: use same name as lslice.
  • jointype – choose between ‘inner’, ‘left’,’right’, or ‘full’ equijoin. Default: inner
  • merge_same – False, ‘equi’ or ‘all’. Default: False
  • mode – Type of broadcasting, ‘dim’ or ‘pos’, i.e. respectively on identity or position. Default: ‘dim’.

Examples:

>>> x = Rep([('a',1), ('b', 2), ('c',3), ('c', 4)])

Slices: | f0       | f1
--------------------------
Type:   | bytes[1] | int64
Dims:   | d1:4     | d1:4
Data:   |          |
        | a        | 1
        | b        | 2
        | c        | 3
        | c        | 4

Dim order: d1:4

>>>  y = Rep([('a','test1'),('d','test2'), ('c', 'test3')])

Slices: | f0       | f1
-----------------------------
Type:   | bytes[1] | bytes[5]
Dims:   | d1:3     | d1:3
Data:   |          |
        | a        | test1
        | d        | test2
        | c        | test3

Dim order: d1:3


>>> x.Match(y, _.f0, _.f0) 

Slices: | f0       | f1    | f1      
-------------------------------------
Type:   | bytes[1] | int64 | bytes[5]
Dims:   | d1:*     | d1:*  | d1:*    
Data:   |          |       |         
        | a        | 1     | test1   
        | c        | 3     | test3   
        | c        | 4     | test3   

Dim order: d1:*

The f0 slices of the ‘x’ and ‘y’ have been collapsed into a single slice, as they had the same name and content (as imposed by the equality condition).

Note that this call is equivalent to:

>>> x |Match(_.f0, _.f0)| y

Or, because both slices are named similarly:

>>> x |Match(_.f0)| y

If there would have been only a single common named slice in x and y, one could also have used:

>>> x |Match| y

This is however not the case here, as also ‘f1’ is shared by both x and y.

To access similarly named slices from the left or right operand, use the bookmarks as defined by the Combine operation (see documentation there):

>>> (x |Match(_.f0)| y).R.f1

Slices: | f1      
------------------
Type:   | bytes[5]
Dims:   | d1:*    
Data:   |         
        | test1   
        | test3   
        | test3   

Dim order: d1:*  

The join type by default is ‘inner’, which means that only rows which are similar in both slices are kept. One can also use the ‘left’, ‘right’ or ‘full’ join types. In these cases, unmatched rows in respectively the left, right and both source(s) are also kept:

>>> x |Match(_.f0, join_type=='left')| y          

Slices: | f0       | f1    | f0        | f1       
--------------------------------------------------
Type:   | bytes[1] | int64 | bytes?[1] | bytes?[5]
Dims:   | d1:*     | d1:*  | d1:*      | d1:*     
Data:   |          |       |           |          
        | a        | 1     | a         | test1    
        | c        | 3     | c         | test3    
        | c        | 4     | c         | test3    
        | b        | 2     | --        | --       

Dim order: d1:*

>>> x |Match(_.f0, join_type=='full')| y         

Slices: | f0        | f1     | f0        | f1       
----------------------------------------------------
Type:   | bytes?[1] | int64? | bytes?[1] | bytes?[5]
Dims:   | d1:*      | d1:*   | d1:*      | d1:*     
Data:   |           |        |           |          
        | a         | 1      | a         | test1    
        | c         | 3      | c         | test3    
        | c         | 4      | c         | test3    
        | b         | 2      | --        | --       
        | --        | --     | d         | test2    

Dim order: d1:*

Sometimes in these cases, it is useful to merge the slices that have similar information, in this case both ‘f0’ slices. This can be accomplished using the ‘merge_same’ parameter:

>>> x |Match(_.f0, join_type=='full', merge_same='equi')| y

Slices: | f0        | f1     | f1
----------------------------------------
Type:   | bytes?[1] | int64? | bytes?[5]
Dims:   | d1:*      | d1:*   | d1:*
Data:   |           |        |
        | a         | 1      | test1
        | c         | 3      | test3
        | c         | 4      | test3
        | b         | 2      | --
        | d         | --     | test2

Dim order: d1:*

The value ‘equi’ selects the slices used for the equality condition. An alternative is to call with a tuple of the slice names that should be merged:

>>> x |Match(_.f0, jointype='full', merge_same =('f0',))| y 

Another case is when one wants to merge slices with dissimilar names. This can be accomplished by using a nested tuple:

>>> x |Match(_.f0, jointype='full', merge_same =(('f0','f0'),))| y 

Finally, one can also merge on all slices with the same names, by setting merge_same to ‘all’ or True. For the current example, this would generate an error, because slices ‘f1’ and ‘f1’ have conflicting content for the same rows:

>>> x |Match(_.f0, jointype='full', merge_same =True)| y
RuntimeError: Found unequal values during merge: 1 != test1

The final parameter, ‘mode’, can only be illustrated with a slightly more complicated example, in which we have multiple dimensions:

>>> x = Rep([[1,2],[1,2,3]])    

Slices: | data     
-------------------
Type:   | int64    
Dims:   | d1:2<d2:~
Data:   |          
        | [1 2]    
        | [1 2 3]  

Dim order: d1:2<d2:~ 


>>> y = Rep([[2,3,4],[1,3,4]])

Slices: | data
-------------------
Type:   | int64
Dims:   | d1:2<d2:3
Data:   |
        | [2 3 4]
        | [1 3 4]

Dim order: d1:2<d2:3

Matching these datasets to each other, will match them on dimensions ‘d2’ in both datasets (which get renamed to d2a and d2b):

>>> x |Match| y
Slices: | data
-------------------------------
Type:   | int64
Dims:   | d1a:2<d1b:2<d2a_d2b:~
Data:   |
        | [[2] [1]]
        | [[2 3] [1 3]]

Dim order: d1a:2<d1b:2<d2a_d2b:~

Note that the dataset has three dimensions, a two by two matrix of the dimensions ‘d1’ in both datasets, with nested lists of the Match results of each pair of rows of both datasets.

But, maybe we intended for dimensions ‘d1’ in both datasets to be matched to each other, although they have not the same identity (meaning that while similarly named, Ibidas assumes that they refer to different data). With ‘positional’ broadcasting, we match dimensions on position.

In this case, we have dimensions ‘d2’ in both datasets used for the matching, so those need no broadcasting. Directly before these dimensions ‘d2’, we have in both datasets a dimension ‘d1’. When using positional broadcasting, these will be matched to each other (while with dimensional broadcasting they will only be matched if they have the same identity):

>>> x |Match(mode='pos')| y

slices: | data
-------------------------
Type:   | int64
Dims:   | d1a:2<d2a_d2b:~
Data:   |
        | [2]
        | [1 3]

Dim order: d1a:2<d2a_d2b:~

Note that both d1 dimensions are now matched to each other, and a Match is done between only [1,2] and [2,3,4], and [1,2,3] and [1,3,4], instead of all possible pairs of rows.

Max(dim=None)
Mean(dim=None)
Median(dim=None)
Merge(other)
Min(dim=None)
Names

Returns names of all slices

Pos(dim=None)
Prod(dim=None)
Rank(dim=None, descend=False)
Redim(*args, **kwds)

Assign new dimensions

example: .Redim(‘new_dimname’, _.f0) Assign new dim with name ‘new_dimname’ to first dimension of slice f0

example: .Redim(‘new_dimname’, f0=1) Assign new dim with name ‘new_dimname’ to second dimension of slice f0

example: .Redim(‘new_dimname’, {_.Dd1.Without(‘f0’):1, ‘f0’:1}) Assign all slices with dimension d1 (except f0) as first dim a new dim with name ‘new_dimname’. Do the same to slice f0, but as second dimension.

Rename(*names, **kwds)

Rename slices.

Parameters:
  • names – names without keywords. Number of names should match number of slices.
  • kwds – names with keywords, e.g. f0=”genes”. Number does not have to match number of slices.

Examples:

>>> na = a.Rename("genes","scores")
>>> na = a.Rename(f0 = "genes", f1 = "scores")
Shortcut:

One can use the division operation to rename slices.

  • tuple:

    Number of names in tuple should match number of slices.

    >>> a/("genes", "scores")
    
  • dict:

    Similar to keywords.

    >>> a/{"f0": "genes"}
    
  • str:

    Can only be used if representor object consists of single slice.

    >>> a.f0/"genes"
    
Replace(slice, translator, fromslice=0, toslice=1)
ReplaceMissing(def_value='NOVALUE')
Set(dim=None)
Shape()

Returns shape of all dimensions as slices in a representor object.

Example:

>>> x = Rep([[1,2,3],[4,5,6]])

Slices: | data     
-------------------
Type:   | int64    
Dims:   | d1:2<d2:3
Data:   |          
        | [1 2 3]  
        | [4 5 6]  

Dim order: d1:2<d2:3

>>> x.Shape()

Slices: | d1    | d2   
-----------------------
Type:   | int64 | int64
Dims:   |       | d1:2 
Data:   |       |      
        | 2     | 3    
        |       | 3    

Dim order: d1:2
Show(table_length=100)

Prints table of contents.

Parameters:table_length – Number of rows to show for each dimension (default: 100)

Show can be used to view a larger part of the table then the default output (15 rows) you get in Ipython/Ibidas or by using str().

Show returns its representor object, allowing you to include it at any point in a query to observe results, e.g.:

>>> x.Unique().Show() |Match| y 
Slices
Sort(*slices, **kwargs)

Performs sort on data.

Example:

  • Sort slices in x on all slices. If multiple slices, combines into tuple, then sort it.

    >>> x.Sort()
    
  • Sort x on slice f1

    >>> x.Sort(_.f1)
    
  • Sort x on slice f1, f3.

    >>> x.Sort(_.f1, _.f3) 
    

For other possible sort slice selection formats, see get function.

SplitDim(lshape, rshape, lname=None, rname=None, dimsel=None)

Splits dim into two dimensions.

Parameters:
  • lshape – Left shape (integer or array of lengths)
  • rshape – Right dim shape
  • lname – New name of left dimension (default:autogenerated).
  • rname – New name of right dimension (default:autogenerated).
  • dimsel – Dim to split (default: last common dimension).

Example:

>>> x = Rep([1,2,3,4,5,6])

Slices: | data
---------------
Type:   | int64
Dims:   | d1:6
Data:   |
        | 1
        | 2
        | 3
        | 4
        | 5
        | 6

Dim order: d1:6

>>> x.SplitDim(3,2)

Slices: | data
-----------------------
Type:   | int64
Dims:   | d2:3<d3:2
Data:   |
        | [1 2]
        | [3 4]
        | [5 6]

Dim order: d2:3<d3:2
SplitOnPattern(value, max_splits=0, ignore_case=False)
Std(dim=None)
Sum(dim=None)
TakeFrom(other, allow_missing=False, keep_missing=False)
To(*slices, **kwargs)
ToPython(**args)

Converts data into python data structure

Transpose(permute_idxs=(1, 0))

Transposes the dimensions of slices. Can only be applied to the common dimensions of slices.

Parameters:permute_idxs – index order of new dims. By default, performs matrix transpose, of first two dims, i.e. permute_idxs=(1,0)
Tuple()

Combines slices into a tuple type

Type

Returns type of this object. If multiple slices, returns tuple type of slice types.

Union(other, slices='COMMONPOS', dims=-1L, mode='dim')

Union compares dataset A and B, given all unique rows that occur in either or both datasets. For further documentation, see ‘Intersect’

Unique(*slices, **kwargs)
Upper()
Without(*slices)

Previous topic

Module reference

Next topic

Type object

This Page