Create and modify a MODO#

A MODO is a digital object to store, share and access omics data (genomics, transcriptomics, proteomics and metabolomics) and their metadata. Each MODO consists of a unique id, a creation and an update timestamp and some further optional metadata. Elements such as data entities, samples, assays and reference genomes can be linked and added to a MODO. The full data model can be found at modos-schema.

Generate a MODO from scratch#

Create the object#

To create a new MODO you only need to specify the path where you want to generate the object. This will automatically generate a new zarr group at the specified path. If not specified explicitly the MODO id will be set to the path name.

from modos.api import MODO
modo = MODO(path = "data/ex")
modo
# <modos.api.MODO object at 0x7df3131cb670>
modos create data/ex

Note

If the specified path refers to an existing MODO, the existing object will be loaded instead of creating a new object and overwriting the existing object. Check update for details on how to update metadata of an existing MODO.

Warning

The specified path can not point to an existing object other than a MODO or the command will fail.

Add elements to the object#

To add omics data entities or further metadata to the object, you can add elements to the MODO. There are 4 different element types, that can be added:

  • sample

  • assay

  • data

  • reference

An element of the type data can be a DataEntity or further spefied as an AlignmentSet, an Array, a VariantSet.

from modos.api import MODO
import modos_schema.datamodel as model

# Load modo (see above)
modo = MODO(path = "data/ex")

# Generate a data element
data = model.DataEntity(id="genomics1", name= "demo_genomics", description = "A tiny cram file for demos", data_format="CRAM", data_path = "/internal/path/to/store/cram_file")

# Add element to modo
modo.add_element(element = data, source_file="path/to/cram_file.cram")
modos add --source-file path/to/cram_file.cram data/ex data

Note

To specify a file that should be associated with this object the source-file option can be used. In addition elements can be linked with each other, e.g. a VariantSet to a ReferenceGenome or a DataEntity to a Sample by using the parent/part-of option.

Warning

Files associated through the source-file option will be copied into the MODO at the path specified in the data_path attribute. For large files this can take some time.

Generate a MODO from (yaml-)file#

Alternatively, a MODO and all associated elements can be specified in a yaml-file, such as the following example.yaml:

# An example yaml file to generate a MODO.

- element:
    id: ex
    "@type": MODO
    description: "Example modo for tests"
    creation_date: "2024-01-17T00:00:00"
    last_update_date: "2024-01-17T00:00:00"
    has_assay: assay1

- element:
    id: assay1
    "@type": Assay
    name: Assay 1
    description: Example assay for tests
    has_sample: sample1
    omics_type: GENOMICS

- element:
    id: demo1
    "@type": DataEntity
    name: Demo 1
    description: Demo CRAM file for tests.
    data_format: CRAM
    data_path: demo1.cram
    has_reference: reference1
  args:
    source_file: data/ex/demo1.cram

- element:
    id: reference1
    "@type": ReferenceGenome
    name: Reference 1
    data_path: reference1.fa
  args:
    source_file: data/ex/reference1.fa


- element:
    id: sample1
    "@type": Sample
    name: Sample 1
    description: An example sample for tests.
    collector: Foo university
    sex: Male

In this yaml file each element is a separate list entry. Within each list entry the element section specifies all relevant metadata as well as the element type (using @type: ELEMENT_TYPE as syntax). All valid element types, their fields and potential links can be found in the modos-schema. The args section provides additional arguments that are valid for adding an element to modo (see Add elements to the object), e.g. args: source_file: provides the path to the file that should be added into modo.

Using this example.yaml a MODO and all specified associated elements can be generated in one command:

from modos.api import MODO
modo = MODO.from_file(path = "path/to/example.yaml", object_directory = "data/ex")
modos create --from-file "path/to/example.yaml" data/ex

Update or remove a MODO element#

All elements of a MODO can be added (see Add elements to the object) or removed at any timepoint using the element id:

# Remove an associated element
modo.remove_element("data/genomics1")
modos remove data/ex data/genomics

To update an existing element a new entity of the same type can be provided:

import modos_schema.datamodel as model

# Generate the data element from above with a change in name
# Fields that are not changed will be kept
data = model.DataEntity(id="genomics1", name="genomics_example", data_format="CRAM", data_path = "/internal/path/to/store/cram_file")

# Update element to modo
modo.update_element(element_id = "data/genomics1", new = data)