# Create and modify a MODO A `MODO` is a digital object to store, share and access omics data (genomics, transcriptomics, proteomics and metabolomics) and their metadata. Each `MODO` consists of a unique id, a creation and an update timestamp and some further optional metadata. Elements such as __data entities__, __samples__, __assays__ and __reference genomes__ can be linked and added to a `MODO`. The full data model can be found at modos-schema. (scratch)= ## Generate a MODO from scratch (Create_scratch)= ### Create the object To create a new `MODO` you only need to specify the `path` where you want to generate the object. This will automatically generate a new zarr group at the specified `path`. If not specified explicitly the `MODO` id will be set to the `path` name. ::::{tab-set} :::{tab-item} python :sync: python ```{code-block} python from modos.api import MODO modo = MODO(path = "data/ex") modo # ``` ::: :::{tab-item} cli :sync: cli ```{code-block} console modos create data/ex ``` ::: :::: :::{note} If the specified path refers to an existing `MODO`, the existing object will be loaded instead of creating a new object and overwriting the existing object. Check [update](update) for details on how to update metadata of an existing `MODO`. ::: :::{warning} The specified `path` can not point to an existing object other than a `MODO` or the command will fail. ::: (add_scratch)= ### Add elements to the object To add omics data entities or further metadata to the object, you can add elements to the `MODO`. There are 4 different element types, that can be added: - sample - assay - data - reference An element of the type data can be a DataEntity or further spefied as an AlignmentSet, an Array, a VariantSet. ::::{tab-set} :::{tab-item} python :sync: python ```{code-block} python from modos.api import MODO import modos_schema.datamodel as model # Load modo (see above) modo = MODO(path = "data/ex") # Generate a data element data = model.DataEntity(id="genomics1", name= "demo_genomics", description = "A tiny cram file for demos", data_format="CRAM", has_sample= "sample/sample1", data_path = "/internal/path/to/store/cram_file") # Add element to modo modo.add_element(element = data, source_file="path/to/cram_file.cram") ``` ::: :::{tab-item} cli :sync: cli ```{code-block} console modos add --source-file path/to/cram_file.cram data/ex data ``` ::: :::: :::{note} To specify a file that should be associated with this object the `source-file` option can be used. In addition elements can be linked with each other, e.g. a `VariantSet` to a `ReferenceGenome` or a `DataEntity` to a `Sample` by using the `parent`/`part-of` option. ::: :::{warning} Files associated through the `source-file` option will be copied into the `MODO` at the path specified in the `data_path` attribute. For large files this can take some time. ::: (file)= ## Generate a MODO from (yaml-)file Alternatively, a MODO and all associated elements can be specified in a `yaml-file`, such as the following `example.yaml`: ```{code-block} yaml # An example yaml file to generate a MODO. - element: id: ex "@type": MODO description: "Example modo for tests" creation_date: "2024-01-17T00:00:00" last_update_date: "2024-01-17T00:00:00" has_assay: assay1 - element: id: assay1 "@type": Assay name: Assay 1 description: Example assay for tests has_data: demo1 omics_type: GENOMICS - element: id: demo1 "@type": DataEntity name: Demo 1 description: Demo CRAM file for tests. data_format: CRAM data_path: demo1.cram has_sample: sample1 has_reference: reference1 args: source_file: data/ex/demo1.cram - element: id: reference1 "@type": ReferenceGenome name: Reference 1 data_path: reference1.fa args: source_file: data/ex/reference1.fa - element: id: sample1 "@type": Sample name: Sample 1 description: An example sample for tests. collector: Foo university sex: Male ``` In this yaml file each element is a separate list entry. Within each list entry the `element` section specifies all relevant metadata as well as the `element type` (using `@type: ELEMENT_TYPE` as syntax). All valid element types, their fields and potential links can be found in the modos-schema. The `args` section provides additional arguments that are valid for adding an element to modo (see [Add elements to the object](add_scratch)), e.g. `args: source_file:` provides the path to the file that should be added into modo. Using this `example.yaml` a `MODO` and all specified associated elements can be generated in one command: ::::{tab-set} :::{tab-item} python :sync: python ```{code-block} python from modos.api import MODO modo = MODO.from_file(config_path= "path/to/example.yaml", object_path = "data/ex") ``` ::: :::{tab-item} cli :sync: cli ```{code-block} console modos create --from-file "path/to/example.yaml" data/ex ``` ::: :::: ### Advanced example ```{code-block} yaml # A more advanced example with multiple assays, samples and files. - element: id: f2a991_full_mouse "@type": MODO description: "Example complex modo describing multiple assays on fictional mouse f2a991." creation_date: "2024-01-17T00:00:00" last_update_date: "2024-01-17T00:00:00" has_assay: - 001_rnaseq - 001_wgs - element: id: 001_rnaseq "@type": Assay name: Tissue specific RNA-seq. description: Example RNA-seq assay with multiple samples. sample_processing: - http://www.ebi.ac.uk/efo/EFO_0001461 - http://www.ebi.ac.uk/efo/EFO_0008567 - http://www.ebi.ac.uk/efo/EFO_0009653 has_data: - 001_brain_rna_aligned - 001_liver_rna_aligned omics_type: TRANSCRIPTOMICS - element: id: 001_wgs "@type": Assay name: Mouse WGS description: Example whole genome sequencing assay with one sample. sample_processing: - http://www.ebi.ac.uk/efo/EFO_0001461 - http://www.ebi.ac.uk/efo/EFO_0010172 - http://www.ebi.ac.uk/efo/EFO_0008631 has_data: 001_wgs_aligned omics_type: GENOMICS - element: id: 001_wgs_aligned "@type": DataEntity name: Mouse WGS. description: Mouse WGS alignment, Illumina sequencing. data_format: CRAM data_path: 001_wgs_aligned.cram has_reference: mm39 has_sample: 001_tail_dna args: source_file: data/tail.cram - element: id: 001_brain_rna_aligned "@type": DataEntity name: Mouse brain RNAseq. description: Mouse brain RNAseq alignment, Illumina sequencing. data_format: CRAM data_path: 001_brain_rna_aligned.cram has_reference: mm39 has_sample: 001_brain_rna args: source_file: data/brain.cram - element: id: 001_liver_rna_aligned "@type": DataEntity name: Mouse liver RNAseq. description: Mouse liver RNAseq alignment, Illumina sequencing. data_format: CRAM data_path: 001_liver_rna_aligned.cram has_reference: mm39 has_sample: 001_liver_rna args: source_file: data/liver.cram - element: id: mm39 "@type": ReferenceGenome name: Genome assembly mm39 description: Genome assembly mm39 of M. musculus strain C57BL/6J. data_path: mm39.fa version: mm39 source_uri: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001635.27 taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442 args: source_file: data/reference1.fa - element: id: 001_liver_rna "@type": Sample name: Liver RNA library description: RNA library from liver of mouse f2a991. collector: Foo university taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442 source_material: http://purl.obolibrary.org/obo/UBERON_0002107 sex: Male - element: id: 001_brain_rna "@type": Sample name: Brain ventricles RNA library description: RNA library from brain ventricles of mouse f2a991. collector: Foo university taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442 source_material: http://purl.obolibrary.org/obo/UBERON_0004086 sex: Male - element: id: 001_tail_dna "@type": Sample name: Tail snip DNA library description: WGS library from tail snip of mouse f2a991. collector: Foo university taxon_id: http://purl.obolibrary.org/obo/NCBITaxon_39442 source_material: http://purl.obolibrary.org/obo/UBERON_0002415 sex: Male ``` ```{code-block} console $ modos create -f advanced.yaml advanced $ modos show advanced --zarr INFO | Using local storage for advanced / ├── assay │ ├── 001_rnaseq │ └── 001_wgs ├── data │ ├── 001_brain_rna_aligned │ ├── 001_liver_rna_aligned │ └── 001_wgs_aligned ├── reference │ └── mm39 ├── sample │ ├── 001_brain_rna │ ├── 001_liver_rna │ └── 001_tail_dna └── sequence ``` (update)= ## Update or remove a MODO element All elements of a `MODO` can be added (see [Add elements to the object](add_scratch)) or removed at any timepoint using the `element id`: ::::{tab-set} :::{tab-item} python :sync: python ```{code-block} python # Remove an associated element modo.remove_element("data/genomics1") ``` ::: :::{tab-item} cli :sync: cli ```{code-block} console modos remove data/ex data/genomics ``` ::: :::: To update an existing element a new entity of the same type can be provided: ```{code-block} python import modos_schema.datamodel as model # Generate the data element from above with a change in name # Fields that are not changed will be kept data = model.DataEntity(id="genomics1", name="genomics_example", data_format="CRAM", data_path = "/internal/path/to/store/cram_file") # Update element to modo modo.update_element(element_id = "data/genomics1", new = data) ```