Create and modify a MODO#
A MODO is a digital object to store, share and access omics data (genomics, transcriptomics, proteomics and metabolomics) and their metadata.
Each MODO consists of a unique id, a creation and an update timestamp and some further optional metadata. Elements such as data entities, samples, assays and reference genomes can be linked and added to a MODO. The full data model can be found at modos-schema.
Generate a MODO from scratch#
Create the object#
To create a new MODO you only need to specify the path where you want to generate the object. This will automatically generate a new zarr group at the specified path. If not specified explicitly the MODO id will be set to the path name.
from modos.api import MODO
modo = MODO(path = "data/ex")
modo
# <modos.api.MODO object at 0x7df3131cb670>
modos create data/ex
Note
If the specified path refers to an existing MODO, the existing object will be loaded instead of creating a new object and overwriting the existing object.
Check update for details on how to update metadata of an existing MODO.
Warning
The specified path can not point to an existing object other than a MODO or the command will fail.
Add elements to the object#
To add omics data entities or further metadata to the object, you can add elements to the MODO.
There are 4 different element types, that can be added:
sample
assay
data
reference
An element of the type data can be a DataEntity or further spefied as an AlignmentSet, an Array, a VariantSet.
from modos.api import MODO
import modos_schema.datamodel as model
# Load modo (see above)
modo = MODO(path = "data/ex")
# Generate a data element
data = model.DataEntity(id="genomics1", name= "demo_genomics", description = "A tiny cram file for demos", data_format="CRAM", has_sample= "sample/sample1", data_path = "/internal/path/to/store/cram_file")
# Add element to modo
modo.add_element(element = data, source_file="path/to/cram_file.cram")
modos add --source-file path/to/cram_file.cram data/ex data
Note
To specify a file that should be associated with this object the source-file option can be used.
In addition elements can be linked with each other, e.g. a VariantSet to a ReferenceGenome or a DataEntity to a Sample by using the parent/part-of option.
Warning
Files associated through the source-file option will be copied into the MODO at the path specified in the data_path attribute. For large files this can take some time.
Generate a MODO from (yaml-)file#
Alternatively, a MODO and all associated elements can be specified in a yaml-file, such as the following example.yaml:
# An example yaml file to generate a MODO.
- element:
id: ex
"@type": MODO
description: "Example modo for tests"
creation_date: "2024-01-17T00:00:00"
last_update_date: "2024-01-17T00:00:00"
has_assay: assay1
- element:
id: assay1
"@type": Assay
name: Assay 1
description: Example assay for tests
has_data: demo1
omics_type: GENOMICS
- element:
id: demo1
"@type": DataEntity
name: Demo 1
description: Demo CRAM file for tests.
data_format: CRAM
data_path: demo1.cram
has_sample: sample1
has_reference: reference1
args:
source_file: data/ex/demo1.cram
- element:
id: reference1
"@type": ReferenceGenome
name: Reference 1
data_path: reference1.fa
args:
source_file: data/ex/reference1.fa
- element:
id: sample1
"@type": Sample
name: Sample 1
description: An example sample for tests.
collector: Foo university
sex: Male
In this yaml file each element is a separate list entry. Within each list entry the element section specifies all relevant metadata as well as the element type (using @type: ELEMENT_TYPE as syntax).
All valid element types, their fields and potential links can be found in the modos-schema.
The args section provides additional arguments that are valid for adding an element to modo (see Add elements to the object), e.g. args: source_file: provides the path to the file that should be added into modo.
Using this example.yaml a MODO and all specified associated elements can be generated in one command:
from modos.api import MODO
modo = MODO.from_file(config_path= "path/to/example.yaml", object_path = "data/ex")
modos create --from-file "path/to/example.yaml" data/ex
Advanced example#
# A more advanced example with multiple assays, samples and files.
- element:
id: f2a991_full_mouse
"@type": MODO
description: "Example complex modo describing multiple assays on fictional mouse f2a991."
creation_date: "2024-01-17T00:00:00"
last_update_date: "2024-01-17T00:00:00"
has_assay:
- 001_rnaseq
- 001_wgs
- element:
id: 001_rnaseq
"@type": Assay
name: Tissue specific RNA-seq.
description: Example RNA-seq assay with multiple samples.
sample_processing:
- http://www.ebi.ac.uk/efo/EFO_0001461
- http://www.ebi.ac.uk/efo/EFO_0008567
- http://www.ebi.ac.uk/efo/EFO_0009653
has_data:
- 001_brain_rna_aligned
- 001_liver_rna_aligned
omics_type: TRANSCRIPTOMICS
- element:
id: 001_wgs
"@type": Assay
name: Mouse WGS
description: Example whole genome sequencing assay with one sample.
sample_processing:
- http://www.ebi.ac.uk/efo/EFO_0001461
- http://www.ebi.ac.uk/efo/EFO_0010172
- http://www.ebi.ac.uk/efo/EFO_0008631
has_data: 001_wgs_aligned
omics_type: GENOMICS
- element:
id: 001_wgs_aligned
"@type": DataEntity
name: Mouse WGS.
description: Mouse WGS alignment, Illumina sequencing.
data_format: CRAM
data_path: 001_wgs_aligned.cram
has_reference: mm39
has_sample: 001_tail_dna
args:
source_file: data/tail.cram
- element:
id: 001_brain_rna_aligned
"@type": DataEntity
name: Mouse brain RNAseq.
description: Mouse brain RNAseq alignment, Illumina sequencing.
data_format: CRAM
data_path: 001_brain_rna_aligned.cram
has_reference: mm39
has_sample: 001_brain_rna
args:
source_file: data/brain.cram
- element:
id: 001_liver_rna_aligned
"@type": DataEntity
name: Mouse liver RNAseq.
description: Mouse liver RNAseq alignment, Illumina sequencing.
data_format: CRAM
data_path: 001_liver_rna_aligned.cram
has_reference: mm39
has_sample: 001_liver_rna
args:
source_file: data/liver.cram
- element:
id: mm39
"@type": ReferenceGenome
name: Genome assembly mm39
description: Genome assembly mm39 of M. musculus strain C57BL/6J.
data_path: mm39.fa
version: mm39
source_uri: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001635.27
taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442
args:
source_file: data/reference1.fa
- element:
id: 001_liver_rna
"@type": Sample
name: Liver RNA library
description: RNA library from liver of mouse f2a991.
collector: Foo university
taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442
source_material: http://purl.obolibrary.org/obo/UBERON_0002107
sex: Male
- element:
id: 001_brain_rna
"@type": Sample
name: Brain ventricles RNA library
description: RNA library from brain ventricles of mouse f2a991.
collector: Foo university
taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442
source_material: http://purl.obolibrary.org/obo/UBERON_0004086
sex: Male
- element:
id: 001_tail_dna
"@type": Sample
name: Tail snip DNA library
description: WGS library from tail snip of mouse f2a991.
collector: Foo university
taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442
source_material: http://purl.obolibrary.org/obo/UBERON_0002415
sex: Male
$ modos create -f advanced.yaml advanced
$ modos show advanced --zarr
INFO | Using local storage for advanced
/
├── assay
│ ├── 001_rnaseq
│ └── 001_wgs
├── data
│ ├── 001_brain_rna_aligned
│ ├── 001_liver_rna_aligned
│ └── 001_wgs_aligned
├── reference
│ └── mm39
├── sample
│ ├── 001_brain_rna
│ ├── 001_liver_rna
│ └── 001_tail_dna
└── sequence
Update or remove a MODO element#
All elements of a MODO can be added (see Add elements to the object) or removed at any timepoint using the element id:
# Remove an associated element
modo.remove_element("data/genomics1")
modos remove data/ex data/genomics
To update an existing element a new entity of the same type can be provided:
import modos_schema.datamodel as model
# Generate the data element from above with a change in name
# Fields that are not changed will be kept
data = model.DataEntity(id="genomics1", name="genomics_example", data_format="CRAM", data_path = "/internal/path/to/store/cram_file")
# Update element to modo
modo.update_element(element_id = "data/genomics1", new = data)