modos.api#
Classes#
Multi-Omics Digital Object |
Module Contents#
- class modos.api.MODO(path, id=None, name=None, description=None, creation_date=date.today(), last_update_date=date.today(), has_assay=[], source_uri=None, endpoint=None, s3_kwargs=None, services=None)[source]#
Multi-Omics Digital Object A digital archive containing several multi-omics data and records connected by zarr-backed metadata.
- Parameters:
path (Union[pathlib.Path, str]) – Path to the archive directory.
id (Optional[str]) – MODO identifier. Defaults to the directory name.
name (Optional[str]) – Human-readable name.
description (Optional[str]) – Human readable description.
creation_date (datetime.date) – When the MODO was created.
last_update_date (datetime.date) – When the MODO was last updated.
has_assay (List) – Existing assay identifiers to attach to MODO.
source_uri (Optional[str]) – URI of the source data.
endpoint (Optional[pydantic.HttpUrl]) – URL to the modos server.
s3_kwargs (Optional[dict[str, Any]]) – Keyword arguments for the S3 storage.
services (Optional[dict[str, pydantic.HttpUrl]]) – Optional dictionary of service endpoints.
Examples
>>> demo = MODO("data/ex")
# List identifiers of samples in the archive >>> demo.list_samples() [‘sample/sample1’]
# List files in the archive >>> files = sorted(demo.list_files()) >>> assert Path(‘data/ex/demo1.cram’) in files >>> assert Path(‘data/ex/reference1.fa’) in files
- property zarr: zarr.hierarchy.Group[source]#
- Return type:
- property path: pathlib.Path[source]#
- Return type:
- knowledge_graph(uri_prefix=None)[source]#
Return an RDF graph of the metadata. All identifiers are converted to valid URIs if needed.
- Parameters:
uri_prefix (Optional[str])
- Return type:
rdflib.Graph
- list_files()[source]#
Lists files in the archive recursively (except for the zarr file).
- Return type:
List[pathlib.Path]
- list_arrays(element=None)[source]#
Views arrays in the archive recursively.
- Parameters:
element (Optional[str]) – Element, or group of elements (e.g. data or data/element_id) to show. If not provided, shows the metadata of the entire MODO.
- Return type:
zarr.hierarchy.TreeViewer
- update_date(date=date.today())[source]#
update last_update_date attribute
- Parameters:
date (MODO.update_date.date)
- remove_element(element_id)[source]#
Remove an element from the archive, along with any files directly attached to it and links from other elements to it.
- Parameters:
element_id (str)
- add_element(element, source_file=None, part_of=None)[source]#
Add an element to the archive. If a data file is provided, it will be added to the archive. If the element is part of another element, the parent metadata will be updated.
- Parameters:
element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceGenome) – Element to add to the archive.
source_file (Optional[pathlib.Path]) – File to associate with the element.
part_of (Optional[str]) – Id of the parent element. It must be scoped to the type. For example “sample/foo”.
- _add_any_element(element, source_file=None, part_of=None)[source]#
Add an element of any type to the storage.
- Parameters:
element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceSequence | modos_schema.datamodel.ReferenceGenome)
source_file (Optional[pathlib.Path])
part_of (Optional[str])
- update_element(element_id, new)[source]#
Update element metadata in place by adding new values from model object.
- Parameters:
element_id (str) – Full id path in the zarr store.
new (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.MODO) – Element containing the enriched metadata.
- stream_genomics(file_path, region=None, reference_filename=None)[source]#
Slices both local and remote CRAM, VCF (.vcf.gz), and BCF files returning an iterator over records.