Representor is the primary object in Ibidas. It represents a data set, accesible through slices.
Properties can be accessed to obtain information about this object:
- Names: Slice names
- Type: Data type
- Slices: List of slices
- I: Info on slices/types/dims without executing the query
- Depth: Maximum number of dimensions in slices
Slices can be accessed as attributes, e.g: obj.slicename
Note that all slice names should follow the python syntax rules for variable names, AND use only lower case letters (to distinguish them from method names, which start all with an uppercase letter).
Special attribute access can be obtained through so-called axis specifiers:
Bookmarks: obj.Bbookmarkname Access set of slices with certain bookmark (see Bookmark method)
Dimensions: obj.Ddimname Access all slices with a certain dimension
- Elements: obj.E[dimname] Access Elements of packed arrays. Optional dimname specifies which dimensions to unpack.
Slices without that dimension as outermost dimension are not unpacked.
Fields: obj.Ffieldname Access Fields of packed tuples. obj should have only one slice.
- Left/Right: obj.L, obj.R Special nested bookmarks set by e.g. Match operation to allow
backtracking to separate sources. eg:
>>> ((x |Match| y) |Match| z).LRgives all slices of y (first go Left (get xy), then R (get y)).
Representor objects can be created from python data objects using the ‘Rep’ function, e.g.:
>>> Rep([('a',3),('b',4)])
Packages dimension into array type
Bookmarks slices with a name. Slices can later be accessed using attribute access, with axis indicator “B”.
>>> x = x.Bookmark("myslices")
>>> x.myslices
>>> x.Bmyslices #only necessary to use the 'B' prefix in case there is also a slice named 'myslices'
Cast data to new type.
Allowed formats:
single type for all slices
>>> x.Cast("int32")
type for each slice
>>> x.Cast(("int64","int8"))
type for named slices
>>> x.Cast(your_slice="int8")
Executes the current query.
Normally, query operations (e.g. obj + 3) are not executed immediately. Instead these operations are performed simultaneously when output is requested. This allows us to optimize these operations all together, or e.g. translate them into a SQL query.
However, this behaviour is not always what is needed, e.g:
>>> x = very expensive query
>>> print x[10:20]
>>> print x[10:30]
would execute the query saved in x two times (as ibidas due to being part of an interpreted language cannot analyze the whole script to determine that the output is required twice).
To prevent this, one can instead do:
>>> x = (very expensive query).Copy()
executing the expensive part of the only query once.
Parameters: |
|
---|
Returns max dimension depth (number of dimensins) of slices in this representor.
Detects types of slices, and casts result to this type
Parameters: |
|
---|
Combines slices into a tuple type
Difference compares dataset A and B, given only rows that occur not in both. For further documentation, see ‘Intersect’
Rename dimensions. Similar to Rename for slices.
When using a tuple to supply names for the dimensions, we keep the ordering as given by the DimUnique parameter of a representor object. For clarity, it might be better to supply a dictionary with a name mapping however.
Shortcut: use % operator
Returns dims, ordered according to the order shown below a dataset printout.
Note that dimensions that occur multiple times in the same slice will be repeated (if this is not what is needed, use DimsUnique).
Returns list of unique dims, ordered as used by DimRename.
Applies ‘eachfunc’ to each element in this representor. :param eachfunc: can be any (self-defined) python (lambda) function, or a context operation (e.g. _ + 3). :param dtype: expected output type. If not given, this type is automatically detected if necessary for subsequent operations (which is slower).
Unpacks array type into dimension
Unpacks array type into dimension
Except compares dataset A and B, given only rows from A that occur not in B. For further documentation, see ‘Intersect’
Unpacks tuple type into slices
Performs filtering on this dataset using condition.
Parameters: |
|
---|
What is done to dimensions in the conditions that are not in the data source? Here we follow the default rules in Ibidas for broadcasting.
- First, the dimension in the source that is going to be filtered is identified (see previous sections)
- Secondly, we match this dimension to the last dimension in the condition.
- All remaining dimensions are broadcasted against each other.
The examples use the following dataset:
>>> x = Rep([[1,2,3],[4,5,6]])
Slices: | data
-------------------
Type: | int64
Dims: | d1:2<d2:3
Data: |
| [1 2 3]
| [4 5 6]
Dim order: d1:2<d2:3
Example: integer filtering
Filtering the first element:
>>> x.Filter(0)
Slices: | data
---------------
Type: | int64
Dims: | d1:2
Data: |
| 1
| 4
Dim order: d1:2
This example matches the last common dimension (d2), and selects the first element. This collapses dimension d2.
Note that if no special keywords are required, one can also use brackets to specify the filter operation:
>>> x[0]
is equivalent to the previous filtering operation.
Using the Filter command, we can however also specify that we want to filter a specific dimension:
>>> x.Filter(0, dim='d1')
Slices: | data
---------------
Type: | int64
Dims: | d2:3
Data: |
| 1
| 2
| 3
Dim order: d2:3
One can also use positional indices for the dimension (according to Dim order in the printout):
>>> x.Filter(0, dim=0)
Example: Boolean filtering
Filtering on boolean constraints:
>>> x.Filter(_ > 2)
Slices: | data
--------------------
Type: | int64
Dims: | d1:2<fd2:~
Data: |
| [3]
| [4 5 6]
Dim order: d1:2<fd2:~
Here, the _ operator refers to the enclosing scope, i.e. ‘x’. Equivalent is:
>>> x[_ > 2]
Example: Slice filtering
One can also filter using Python slices:
>>> x.Filter(slice(0,2))
Slices: | data
--------------------
Type: | int64
Dims: | d1:2<fd2:*
Data: |
| [1 2]
| [4 5]
Dim order: d1:2<fd2:*
Note that this is equivalent to:
>>> x[0:2]
(here we do not have to explicitly construct the slice object, as python accepts for this the x:y syntax. Unfortunately, this syntax is not allowed outside brackets).
Integer filtering (with arrays)
Filtering on array:
>>> x.Filter([0,1])
Slices: | data
---------------
Type: | int64
Dims: | d1:2
Data: |
| 1
| 5
Dim order: d1:2
This is maybe not what most expected. Note that the filtering is applied on dimension ‘d2’. The dimension of the [0,1] array is mapped to dimension ‘d1’. Thus, from the first position along ‘d1’ (first row), we select the 0th element from dim d2, and from the second position along ‘d1’, we select the 1th element along dim d2.
We used here positional broadcasting, as the input was not an Ibidas object. That is, the dimension of [0,1] was mapped to the dimension ‘d1’, even though these do not have the same identity. We can however also specify that we want to do identity based broadcasting:
>>> x.Filter([0,1],mode='dim')
Slices: | data
--------------------------
Type: | int64
Dims: | d1:2<d3:2
Data: |
| [1 2]
| [4 5]
Dim order: d1:2<d3:2
This applies the [0,1] array as filter on the d2 dimension, transforming it into dimension d3.
What actually happens is slightly more complicated however: the integers in the [0,1] list are mapped as filters to dimension ‘d2’. This filtering is however done for each element in the [0,1] list, which has dimension ‘d3’. As this dimension is not equal to dimension ‘d1’, it is broadcasted: virtually, the dataset is converted into one with dimensionds d1:2<d3:2<d2:2. Upon applying the filter, dimension ‘d2’ collapses, resulting in a dataset with dimension ‘d1:2<d3:2’. Of course, in practice, we optimize this broadcasting step away.
Such broadcasting can also happen when using position-based broadcasting, e.g.:
>>> x.Filter([[0,1],[0,2]])
Slices: | data
-------------------
Type: | int64
Dims: | d3:2<d1:2
Data: |
| [1 5]
| [1 6]
Dim order: d3:2<d1:2
First, we do the same positional broadcasting, filtering dimension ‘d2’, and mapping the second dimension of [[0,1],[0,2]] to dimension ‘d1’. But then we are left with the extra first dimension of [[0,1],[0,2]], which is called ‘d3’. This dimension is broadcasted. As ‘d1’ is already mapped to, the dimension is put in front of ‘d1’.
We can make this quite complicated, e.g.:
x.Filter([[0,1],[0,2,1]],mode='dim')
Slices: | data
-------------------------
Type: | int64
Dims: | d1:2<d4:2<d3:~
Data: |
| [[1 2] [1 3 2]]
| [[4 5] [4 6 5]]
Dim order: d1:2<d4:2<d3:~
or:
x.Filter([[0,1],[0,1,1]],mode='dim',dim='d1')
Slices: | data
---------------------------------------
Type: | int64
Dims: | d4:2<d3:~<d2:3
Data: |
| [[1 2 3]; [4 5 6]]
| [[1 2 3]; [4 5 6]; [4 5 6]]
Dim order: d4:2<d3:~<d2:3
Flattens (merges) a dimension with previous(parent) dim.
Parameters: |
|
---|
Flattens all dimensions into one dimension.
Parameters: | name – name of new merged dimension. By default, merged names of all dimensions. |
---|
Note that this operation is slightly different from flat, in that it converts all slices to have 1 dimension, even those which have 0 dimensions.
Example:
>>> x = Rep(([[1,2,3],[4,5,6]],'a'))
Slices: | f0 | f1
------------------------------
Type: | int64 | bytes[1]
Dims: | d1:2<d2:3 |
Data: | |
| [1 2 3] | a
| [4 5 6] |
Dim order: d1:2<d2:3
>>> x.Flat()
Slices: | f0 | f1
----------------------------
Type: | int64 | bytes[1]
Dims: | d1_d2:6 |
Data: | |
| 1 | a
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
Dim order: d1_d2:6
>>> x.FlatAll()
Slices: | f0 | f1
----------------------------
Type: | int64 | bytes[1]
Dims: | d1_d2:6 | d1_d2:6
Data: | |
| 1 | a
| 2 | a
| 3 | a
| 4 | a
| 5 | a
| 6 | a
Dim order: d1_d2:6
Select slices in a new representor, combine with other slices.
Parameters: |
|
---|
Examples:
str
>>> a.Get("f0","f3")int
>>> a.Get(0, 3)slice
>>> a.Get(slice(0,3))representor
>>> a.Get(a.f0 + 3, a.f3 == "gene3", Rep(3))context
>>> a.perform_some_operation().Get(_.f0 + 3, _.f3 == "gene3")tuple
>>> a.Get((_.f0, _.f1), _.f3)list
>>> a.Get([_.f0])
Groups data on the content of one or more slices.
Parameters: | flat – Allows one to indicate which slices should not be grouped. |
---|
Example:
>>> x = Rep(([1,1,2,2,3,3,4,4],[1,2,1,2,1,2,1,2],[1,2,3,4,1,2,3,4]))
Slices: | f0 | f1 | f2
-------------------------------
Type: | int64 | int64 | int64
Dims: | d1:8 | d1:8 | d1:8
Data: | | |
| 1 | 1 | 1
| 1 | 2 | 2
| 2 | 1 | 3
| 2 | 2 | 4
| 3 | 1 | 1
| 3 | 2 | 2
| 4 | 1 | 3
| 4 | 2 | 4
Dim order: d1:8
>>> x.GroupBy(_.f0)
Slices: | f0 | f1 | f2
-------------------------------------------
Type: | int64 | int64 | int64
Dims: | gf0:* | gf0:*<gd1:~ | gf0:*<gd1:~
Data: | | |
| 1 | [1 2] | [1 2]
| 2 | [1 2] | [3 4]
| 3 | [1 2] | [1 2]
| 4 | [1 2] | [3 4]
Dim order: gf0:*<gd1:~
Note how slice f0 has now only unique values, and how slices f1 and f2 have now two dimensions, grouped per unique value in f0. One can also group on multiple slices at once:
>>> x.GroupBy((_.f1, _.f2))
Slices: | f0 | f1 | f2
-------------------------------------------------------
Type: | int64 | int64 | int64
Dims: | gdata:*<gd1:~ | gdata:*<gd1:~ | gdata:*<gd1:~
Data: | | |
| [2 4] | [2 2] | [4 4]
| [2 4] | [1 1] | [3 3]
| [1 3] | [1 1] | [1 1]
| [1 3] | [2 2] | [2 2]
Dim order: gdata:*<gd1:~
This groups the data such that the combination of f1 and f2 is unique. This actually equivalent to:
>>> x.GroupBy(_.Get(_.f1, _.f2).Tuple())
That is, ‘(_.f1, _.f2)’ signifies that one wants to get the tuple of f1 and f2, which looks like this:
>>> x.Get((_.f1, _.f2))
Slices: | data
------------------------------
Type: | (f1=int64, f2=int64)
Dims: | d1:8
Data: |
| (1, 1)
| (2, 2)
| (1, 3)
| (2, 4)
| (1, 1)
| (2, 2)
| (1, 3)
| (2, 4)
Dim order: d1:8
Note that, as one does not directly group on slices f1 or f2, these slices are also grouped.
Instead of grouping on combinations of slices, one can also group on multiple slices individually:
>>> x.GroupBy(_.f0, _.f1)
Slices: | f0 | f1 | f2
-------------------------------------------
Type: | int64 | int64 | int64
Dims: | gf0:* | gf1:* | gf0:*<gf1:*<gd1:~
Data: | | |
| 1 | 1 | [[1] [2]]
| 2 | 2 | [[3] [4]]
| 3 | | [[1] [2]]
| 4 | | [[3] [4]]
Dim order: gf0:*<gf1:*<gd1:~
Note that f0 and f1 now have two separate dimensions, while f2 has both these dimensions, and an extra ‘group’ dimension (like before). In this case, the gd1 dimensions is always of length 1, as there are only unique values in f2 for every pair of f0, f1.
Of course, one can remove such an extra dim using filtering, e.g.:
>>> x.GroupBy(_.f0, _.f1)[...,0]
Slices: | f0 | f1 | f2
-------------------------------------
Type: | int64 | int64 | int64
Dims: | gf0:* | gf1:* | gf0:*<gf1:*
Data: | | |
| 1 | 1 | [1 2]
| 2 | 2 | [3 4]
| 3 | | [1 2]
| 4 | | [3 4]
Dim order: gf0:*<gf1:*
However, it is better to already indicate to groupby that some slices do not have to be grouped, using the ‘flat’ parameter. We already saw a case before where grouping of certain slices was not necessary, namely the one where we grouped on the tuple of f1 and f2:
>>> x.GroupBy((_.f1, _.f2))
We can prevent te grouping of f1 and f2 using flat:
>>> x.GroupBy((_.f1, _.f2),flat=['f1','f2'])
Slices: | f0 | f1 | f2
-------------------------------------------
Type: | int64 | int64 | int64
Dims: | gdata:*<gd1:~ | gdata:* | gdata:*
Data: | | |
| [2 4] | 2 | 4
| [2 4] | 1 | 3
| [1 3] | 1 | 1
| [1 3] | 2 | 2
In the case of the multi-dimensional group, we can do the same:
>>> x.GroupBy(_.f0, _.f1, flat='f2')
Slices: | f0 | f1 | f2
-------------------------------------
Type: | int64 | int64 | int64
Dims: | gf0:* | gf1:* | gf0:*<gf1:*
Data: | | |
| 1 | 1 | [1 2]
| 2 | 2 | [3 4]
| 3 | | [1 2]
| 4 | | [3 4]
Dim order: gf0:*<gf1:*
Note that f2 is now non-unique along every dimension.
However, one might also have a case in which a slice is non-unique for just a single slice in a multi-dimensional group, e.g.:
>>> x.Get(_.f0, _.f1, _.f2, _.f1/'f3').GroupBy(_.f0, _.f1, flat=['f2','f3'])
Here, we copied slice f1, calling it ‘f3’. Next, we specified that it should be flat. But note that this slice is still unique along dimension gf0... For these situations, a more advanced format for the flat parameter can be used, in which one can specify w.r.t. to which slices a slice should be grouped:
>>> x.Get(_.f0, _.f1, _.f2, _.f1/'f3').GroupBy(_.f0, _.f1, flat={('f0','f1'):'f2','f1':'f3'})
Slices: | f0 | f1 | f2 | f3
---------------------------------------------
Type: | int64 | int64 | int64 | int64
Dims: | gf0:* | gf1:* | gf0:*<gf1:* | gf1:*
Data: | | | |
| 1 | 1 | [1 2] | 1
| 2 | 2 | [3 4] | 2
| 3 | | [1 2] |
| 4 | | [3 4] |
This specifies that f2 should be flat, while keeping the group for both grouping slices, and ‘f3’ should be flat, while keeping the group only for the second group slice. It is equivalent to:
>>> x.Get(_.f0, _.f1, _.f2, _.f1/'f3').GroupBy(_.f0, _.f1, flat={(0,1):'f2',1:'f3'})
Combines slices into array.
Example:
>>> x = Rep([(1,2),(3,4),(5,6)])
Slices: | f0 | f1
-----------------------
Type: | int64 | int64
Dims: | d1:3 | d1:3
Data: | |
| 1 | 2
| 3 | 4
| 5 | 6
Dim order: d1:3
>>> x.HArray()
Slices: | f0_f1
-------------------
Type: | int64
Dims: | d1:3<d2:2
Data: |
| [1 2]
| [3 4]
| [5 6]
Dim order: d1:3<d2:2
Which is equivalent to this::
>>> x.Get(HArray(_.f0, _.f1))
Combines slices into a tuple type
Intersect compares dataset A and B, given only rows from A that occur also in B.
Parameters: |
|
---|
Checks if elements in a slice are missing
Join allows you to take the cartesian product of two dimensions, and filter them on some condition.
Example:
>>> x = Rep([1,2,3,4,5])
>>> y = Rep([3,4,5,6,7,8])
>>> x.Join(y, x >= y)
Slices: | data | data
---------------------------------
Type: | int64 | int64
Dims: | d1a_fd1b:* | d1a_fd1b:*
Data: | |
| 3 | 3
| 4 | 3
| 4 | 4
| 5 | 3
| 5 | 4
| 5 | 5
Dim order: d1a_fd1b:*
The join operation generates all possible pairs of values out of x and y, and then filters them on x >= y. Note that you can also use the following equivalent forms:
>>> Join(x, y, x >= y)
>>> x |Join(x >= y)| y
One can also make this condition more complex, eg:
>>> x = Rep([[1,2,3],[4,5,6]])
Slices: | data
-------------------
Type: | int64
Dims: | d1:2<d2:3
Data: |
| [1 2 3]
| [4 5 6]
Dim order: d1:2<d2:3
>>> x.Join(y, x <= y)
Slices: | data | data
---------------------------------------------------------------------------------------
Type: | int64 | int64
Dims: | d1:2<d2_fd1a:~ | d1b:2<d2_fd1a:~
Data: | |
| [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3] | [3 4 5 6 7 8 3 4 5 6 7 8 3 4 5 6 7 8]
| [4 4 4 4 4 5 5 5 5 6 6 6] | [4 5 6 7 8 5 6 7 8 6 7 8]
Dim order: d1:2<d2_fd1a:~
>>> x.Join(y, x.Sum() <= y * y)
Slices: | data | data
--------------------------------------
Type: | int64 | int64
Dims: | d1b_fd1a:*<d2:3 | d1b_fd1a:*
Data: | |
| [1 2 3] | 3
| [1 2 3] | 4
| [1 2 3] | 5
| [1 2 3] | 6
| [1 2 3] | 7
| [1 2 3] | 8
| [4 5 6] | 4
| [4 5 6] | 5
| [4 5 6] | 6
| [4 5 6] | 7
| [4 5 6] | 8
>>> x.Join(y, y |In| x)
Slices: | data | data
--------------------------------------
Type: | int64 | int64
Dims: | d1b_fd1a:*<d2:3 | d1b_fd1a:*
Data: | |
| [1 2 3] | 3
| [4 5 6] | 4
| [4 5 6] | 5
| [4 5 6] | 6
Dim order: d1b_fd1a:*<d2:3
The use of context operators is a bit more complex with Join operations, as the context can refer to both sources. The context operator therefore refers to the combination of both. If there is a conflict in slice names (like in the previous examples), one can refer to both slices using the ‘L’ and ‘R’ bookmark (see Combine operation):
>>> x.Join(y, _.L == _.R)
Slices: | data | data
------------------------------------------
Type: | int64 | int64
Dims: | d1:2<d2_fd1a:~ | d1b:2<d2_fd1a:~
Data: | |
| [3] | [3]
| [4 5 6] | [4 5 6]
Dim order: d1:2<d2_fd1a:~
Bring all slices to same dimension height throug packing and broadcasting
Match allows you to take the cartesian product of two dimensions, and filter them on an equality condtion.
Parameters: |
|
---|
Examples:
>>> x = Rep([('a',1), ('b', 2), ('c',3), ('c', 4)])
Slices: | f0 | f1
--------------------------
Type: | bytes[1] | int64
Dims: | d1:4 | d1:4
Data: | |
| a | 1
| b | 2
| c | 3
| c | 4
Dim order: d1:4
>>> y = Rep([('a','test1'),('d','test2'), ('c', 'test3')])
Slices: | f0 | f1
-----------------------------
Type: | bytes[1] | bytes[5]
Dims: | d1:3 | d1:3
Data: | |
| a | test1
| d | test2
| c | test3
Dim order: d1:3
>>> x.Match(y, _.f0, _.f0)
Slices: | f0 | f1 | f1
-------------------------------------
Type: | bytes[1] | int64 | bytes[5]
Dims: | d1:* | d1:* | d1:*
Data: | | |
| a | 1 | test1
| c | 3 | test3
| c | 4 | test3
Dim order: d1:*
The f0 slices of the ‘x’ and ‘y’ have been collapsed into a single slice, as they had the same name and content (as imposed by the equality condition).
Note that this call is equivalent to:
>>> x |Match(_.f0, _.f0)| y
Or, because both slices are named similarly:
>>> x |Match(_.f0)| y
If there would have been only a single common named slice in x and y, one could also have used:
>>> x |Match| y
This is however not the case here, as also ‘f1’ is shared by both x and y.
To access similarly named slices from the left or right operand, use the bookmarks as defined by the Combine operation (see documentation there):
>>> (x |Match(_.f0)| y).R.f1
Slices: | f1
------------------
Type: | bytes[5]
Dims: | d1:*
Data: |
| test1
| test3
| test3
Dim order: d1:*
The join type by default is ‘inner’, which means that only rows which are similar in both slices are kept. One can also use the ‘left’, ‘right’ or ‘full’ join types. In these cases, unmatched rows in respectively the left, right and both source(s) are also kept:
>>> x |Match(_.f0, join_type=='left')| y
Slices: | f0 | f1 | f0 | f1
--------------------------------------------------
Type: | bytes[1] | int64 | bytes?[1] | bytes?[5]
Dims: | d1:* | d1:* | d1:* | d1:*
Data: | | | |
| a | 1 | a | test1
| c | 3 | c | test3
| c | 4 | c | test3
| b | 2 | -- | --
Dim order: d1:*
>>> x |Match(_.f0, join_type=='full')| y
Slices: | f0 | f1 | f0 | f1
----------------------------------------------------
Type: | bytes?[1] | int64? | bytes?[1] | bytes?[5]
Dims: | d1:* | d1:* | d1:* | d1:*
Data: | | | |
| a | 1 | a | test1
| c | 3 | c | test3
| c | 4 | c | test3
| b | 2 | -- | --
| -- | -- | d | test2
Dim order: d1:*
Sometimes in these cases, it is useful to merge the slices that have similar information, in this case both ‘f0’ slices. This can be accomplished using the ‘merge_same’ parameter:
>>> x |Match(_.f0, join_type=='full', merge_same='equi')| y
Slices: | f0 | f1 | f1
----------------------------------------
Type: | bytes?[1] | int64? | bytes?[5]
Dims: | d1:* | d1:* | d1:*
Data: | | |
| a | 1 | test1
| c | 3 | test3
| c | 4 | test3
| b | 2 | --
| d | -- | test2
Dim order: d1:*
The value ‘equi’ selects the slices used for the equality condition. An alternative is to call with a tuple of the slice names that should be merged:
>>> x |Match(_.f0, jointype='full', merge_same =('f0',))| y
Another case is when one wants to merge slices with dissimilar names. This can be accomplished by using a nested tuple:
>>> x |Match(_.f0, jointype='full', merge_same =(('f0','f0'),))| y
Finally, one can also merge on all slices with the same names, by setting merge_same to ‘all’ or True. For the current example, this would generate an error, because slices ‘f1’ and ‘f1’ have conflicting content for the same rows:
>>> x |Match(_.f0, jointype='full', merge_same =True)| y
RuntimeError: Found unequal values during merge: 1 != test1
The final parameter, ‘mode’, can only be illustrated with a slightly more complicated example, in which we have multiple dimensions:
>>> x = Rep([[1,2],[1,2,3]])
Slices: | data
-------------------
Type: | int64
Dims: | d1:2<d2:~
Data: |
| [1 2]
| [1 2 3]
Dim order: d1:2<d2:~
>>> y = Rep([[2,3,4],[1,3,4]])
Slices: | data
-------------------
Type: | int64
Dims: | d1:2<d2:3
Data: |
| [2 3 4]
| [1 3 4]
Dim order: d1:2<d2:3
Matching these datasets to each other, will match them on dimensions ‘d2’ in both datasets (which get renamed to d2a and d2b):
>>> x |Match| y
Slices: | data
-------------------------------
Type: | int64
Dims: | d1a:2<d1b:2<d2a_d2b:~
Data: |
| [[2] [1]]
| [[2 3] [1 3]]
Dim order: d1a:2<d1b:2<d2a_d2b:~
Note that the dataset has three dimensions, a two by two matrix of the dimensions ‘d1’ in both datasets, with nested lists of the Match results of each pair of rows of both datasets.
But, maybe we intended for dimensions ‘d1’ in both datasets to be matched to each other, although they have not the same identity (meaning that while similarly named, Ibidas assumes that they refer to different data). With ‘positional’ broadcasting, we match dimensions on position.
In this case, we have dimensions ‘d2’ in both datasets used for the matching, so those need no broadcasting. Directly before these dimensions ‘d2’, we have in both datasets a dimension ‘d1’. When using positional broadcasting, these will be matched to each other (while with dimensional broadcasting they will only be matched if they have the same identity):
>>> x |Match(mode='pos')| y
slices: | data
-------------------------
Type: | int64
Dims: | d1a:2<d2a_d2b:~
Data: |
| [2]
| [1 3]
Dim order: d1a:2<d2a_d2b:~
Note that both d1 dimensions are now matched to each other, and a Match is done between only [1,2] and [2,3,4], and [1,2,3] and [1,3,4], instead of all possible pairs of rows.
Returns names of all slices
Assign new dimensions
example: .Redim(‘new_dimname’, _.f0) Assign new dim with name ‘new_dimname’ to first dimension of slice f0
example: .Redim(‘new_dimname’, f0=1) Assign new dim with name ‘new_dimname’ to second dimension of slice f0
example: .Redim(‘new_dimname’, {_.Dd1.Without(‘f0’):1, ‘f0’:1}) Assign all slices with dimension d1 (except f0) as first dim a new dim with name ‘new_dimname’. Do the same to slice f0, but as second dimension.
Rename slices.
Parameters: |
|
---|
Examples:
>>> na = a.Rename("genes","scores")
>>> na = a.Rename(f0 = "genes", f1 = "scores")
One can use the division operation to rename slices.
Number of names in tuple should match number of slices.
>>> a/("genes", "scores")
Similar to keywords.
>>> a/{"f0": "genes"}
Can only be used if representor object consists of single slice.
>>> a.f0/"genes"
Returns shape of all dimensions as slices in a representor object.
Example:
>>> x = Rep([[1,2,3],[4,5,6]])
Slices: | data
-------------------
Type: | int64
Dims: | d1:2<d2:3
Data: |
| [1 2 3]
| [4 5 6]
Dim order: d1:2<d2:3
>>> x.Shape()
Slices: | d1 | d2
-----------------------
Type: | int64 | int64
Dims: | | d1:2
Data: | |
| 2 | 3
| | 3
Dim order: d1:2
Prints table of contents.
Parameters: | table_length – Number of rows to show for each dimension (default: 100) |
---|
Show can be used to view a larger part of the table then the default output (15 rows) you get in Ipython/Ibidas or by using str().
Show returns its representor object, allowing you to include it at any point in a query to observe results, e.g.:
>>> x.Unique().Show() |Match| y
Performs sort on data.
Example:
Sort slices in x on all slices. If multiple slices, combines into tuple, then sort it.
>>> x.Sort()
Sort x on slice f1
>>> x.Sort(_.f1)
Sort x on slice f1, f3.
>>> x.Sort(_.f1, _.f3)
For other possible sort slice selection formats, see get function.
Splits dim into two dimensions.
Parameters: |
|
---|
Example:
>>> x = Rep([1,2,3,4,5,6])
Slices: | data
---------------
Type: | int64
Dims: | d1:6
Data: |
| 1
| 2
| 3
| 4
| 5
| 6
Dim order: d1:6
>>> x.SplitDim(3,2)
Slices: | data
-----------------------
Type: | int64
Dims: | d2:3<d3:2
Data: |
| [1 2]
| [3 4]
| [5 6]
Dim order: d2:3<d3:2
Converts data into python data structure
Transposes the dimensions of slices. Can only be applied to the common dimensions of slices.
Parameters: | permute_idxs – index order of new dims. By default, performs matrix transpose, of first two dims, i.e. permute_idxs=(1,0) |
---|
Combines slices into a tuple type
Returns type of this object. If multiple slices, returns tuple type of slice types.
Union compares dataset A and B, given all unique rows that occur in either or both datasets. For further documentation, see ‘Intersect’