Create and modify a MODO#
A MODO
is a digital object to store, share and access omics data (genomics, transcriptomics, proteomics and metabolomics) and their metadata.
Each MODO
consists of a unique id, a creation and an update timestamp and some further optional metadata. Elements such as data entities, samples, assays and reference genomes can be linked and added to a MODO
. The full data model can be found at modos-schema.
Generate a MODO from scratch#
Create the object#
To create a new MODO
you only need to specify the path
where you want to generate the object. This will automatically generate a new zarr group at the specified path
. If not specified explicitly the MODO
id will be set to the path
name.
from modos.api import MODO
modo = MODO(path = "data/ex")
modo
# <modos.api.MODO object at 0x7df3131cb670>
modos create data/ex
Note
If the specified path refers to an existing MODO
, the existing object will be loaded instead of creating a new object and overwriting the existing object.
Check update for details on how to update metadata of an existing MODO
.
Warning
The specified path
can not point to an existing object other than a MODO
or the command will fail.
Add elements to the object#
To add omics data entities or further metadata to the object, you can add elements to the MODO
.
There are 4 different element types, that can be added:
sample
assay
data
reference
An element of the type data can be a DataEntity or further spefied as an AlignmentSet, an Array, a VariantSet.
from modos.api import MODO
import modos_schema.datamodel as model
# Load modo (see above)
modo = MODO(path = "data/ex")
# Generate a data element
data = model.DataEntity(id="genomics1", name= "demo_genomics", description = "A tiny cram file for demos", data_format="CRAM", data_path = "/internal/path/to/store/cram_file")
# Add element to modo
modo.add_element(element = data, source_file="path/to/cram_file.cram")
modos add --source-file path/to/cram_file.cram data/ex data
Note
To specify a file that should be associated with this object the source-file
option can be used.
In addition elements can be linked with each other, e.g. a VariantSet
to a ReferenceGenome
or a DataEntity
to a Sample
by using the parent
/part-of
option.
Warning
Files associated through the source-file
option will be copied into the MODO
at the path specified in the data_path
attribute. For large files this can take some time.
Generate a MODO from (yaml-)file#
Alternatively, a MODO and all associated elements can be specified in a yaml-file
, such as the following example.yaml
:
# An example yaml file to generate a MODO.
- element:
id: ex
"@type": MODO
description: "Example modo for tests"
creation_date: "2024-01-17T00:00:00"
last_update_date: "2024-01-17T00:00:00"
has_assay: assay1
- element:
id: assay1
"@type": Assay
name: Assay 1
description: Example assay for tests
has_sample: sample1
omics_type: GENOMICS
- element:
id: demo1
"@type": DataEntity
name: Demo 1
description: Demo CRAM file for tests.
data_format: CRAM
data_path: demo1.cram
has_reference: reference1
args:
source_file: data/ex/demo1.cram
- element:
id: reference1
"@type": ReferenceGenome
name: Reference 1
data_path: reference1.fa
args:
source_file: data/ex/reference1.fa
- element:
id: sample1
"@type": Sample
name: Sample 1
description: An example sample for tests.
collector: Foo university
sex: Male
In this yaml file each element is a separate list entry. Within each list entry the element
section specifies all relevant metadata as well as the element type
(using @type: ELEMENT_TYPE
as syntax).
All valid element types, their fields and potential links can be found in the modos-schema.
The args
section provides additional arguments that are valid for adding an element to modo (see Add elements to the object), e.g. args: source_file:
provides the path to the file that should be added into modo.
Using this example.yaml
a MODO
and all specified associated elements can be generated in one command:
from modos.api import MODO
modo = MODO.from_file(path = "path/to/example.yaml", object_directory = "data/ex")
modos create --from-file "path/to/example.yaml" data/ex
Update or remove a MODO element#
All elements of a MODO
can be added (see Add elements to the object) or removed at any timepoint using the element id
:
# Remove an associated element
modo.remove_element("data/genomics1")
modos remove data/ex data/genomics
To update an existing element a new entity of the same type can be provided:
import modos_schema.datamodel as model
# Generate the data element from above with a change in name
# Fields that are not changed will be kept
data = model.DataEntity(id="genomics1", name="genomics_example", data_format="CRAM", data_path = "/internal/path/to/store/cram_file")
# Update element to modo
modo.update_element(element_id = "data/genomics1", new = data)