modos.api#

Classes#

MODO

Multi-Omics Digital Object

Module Contents#

class modos.api.MODO(path, id=None, name=None, description=None, creation_date=date.today(), last_update_date=date.today(), has_assay=[], source_uri=None, endpoint=None, s3_kwargs=None, services=None)[source]#

Multi-Omics Digital Object A digital archive containing several multi-omics data and records connected by zarr-backed metadata.

Parameters:
  • path (Union[pathlib.Path, str]) – Path to the archive directory.

  • id (Optional[str]) – MODO identifier. Defaults to the directory name.

  • name (Optional[str]) – Human-readable name.

  • description (Optional[str]) – Human readable description.

  • creation_date (datetime.date) – When the MODO was created.

  • last_update_date (datetime.date) – When the MODO was last updated.

  • has_assay (List) – Existing assay identifiers to attach to MODO.

  • source_uri (Optional[str]) – URI of the source data.

  • endpoint (Optional[pydantic.HttpUrl]) – URL to the modos server.

  • s3_kwargs (Optional[dict[str, Any]]) – Keyword arguments for the S3 storage.

  • services (Optional[dict[str, pydantic.HttpUrl]]) – Optional dictionary of service endpoints.

storage#

Storage backend for the archive.

Type:

Storage

endpoint[source]#

Server endpoint manager.

Type:

EndpointManager

Examples

>>> demo = MODO("data/ex")

# List identifiers of samples in the archive >>> demo.list_samples() [‘sample/sample1’]

# List files in the archive >>> files = [str(x) for x in demo.list_files()] >>> assert ‘data/ex/demo1.cram’ in files >>> assert ‘data/ex/reference.fa’ in files

endpoint[source]#
property zarr: zarr.hierarchy.Group[source]#
Return type:

zarr.hierarchy.Group

property path: pathlib.Path[source]#
Return type:

pathlib.Path

property metadata: dict[source]#
Return type:

dict

knowledge_graph(uri_prefix=None)[source]#

Return an RDF graph of the metadata. All identifiers are converted to valid URIs if needed.

Parameters:

uri_prefix (Optional[str])

Return type:

rdflib.Graph

show_contents(element=None)[source]#

Produces a YAML document of the object’s contents.

Parameters:

element (Optional[str]) – Element, or group of elements (e.g. data or data/element_id) to show. If not provided, shows the metadata of the entire MODO.

Return type:

str

list_files()[source]#

Lists files in the archive recursively (except for the zarr file).

Return type:

List[pathlib.Path]

list_arrays(element=None)[source]#

Views arrays in the archive recursively.

Parameters:

element (Optional[str]) – Element, or group of elements (e.g. data or data/element_id) to show. If not provided, shows the metadata of the entire MODO.

Return type:

zarr.hierarchy.TreeViewer

query(query)[source]#

Use SPARQL to query the metadata graph

Parameters:

query (str)

list_samples()[source]#

Lists samples in the archive.

update_date(date=date.today())[source]#

update last_update_date attribute

Parameters:

date (MODO.update_date.date)

remove_element(element_id)[source]#

Remove an element from the archive, along with any files directly attached to it and links from other elements to it.

Parameters:

element_id (str)

remove_object()[source]#

Remove the complete modo object

add_element(element, source_file=None, part_of=None)[source]#

Add an element to the archive. If a data file is provided, it will be added to the archive. If the element is part of another element, the parent metadata will be updated.

Parameters:
  • element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceGenome) – Element to add to the archive.

  • source_file (Optional[pathlib.Path]) – File to associate with the element.

  • part_of (Optional[str]) – Id of the parent element. It must be scoped to the type. For example “sample/foo”.

_add_any_element(element, source_file=None, part_of=None, allowed_elements=ElementType)[source]#

Add an element of any type to the storage. This is meant to be called internally to add elements automatically.

Parameters:
  • element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceSequence | modos_schema.datamodel.ReferenceGenome)

  • source_file (Optional[pathlib.Path])

  • part_of (Optional[str])

  • allowed_elements (type)

update_element(element_id, new)[source]#

Update element metadata in place by adding new values from model object.

Parameters:
  • element_id (str) – Full id path in the zarr store.

  • new (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.MODO) – Element containing the enriched metadata.

enrich_metadata()[source]#

Enrich MODO metadata in place using content from associated data files.

stream_genomics(file_path, region=None, reference_filename=None)[source]#

Slices both local and remote CRAM, VCF (.vcf.gz), and BCF files returning an iterator over records.

Parameters:
  • file_path (str)

  • region (Optional[str])

  • reference_filename (Optional[str])

Return type:

Iterator[pysam.AlignedSegment | pysam.VariantRecord]

classmethod from_file(config_path, object_path, endpoint=None, s3_kwargs=None, services=None, no_remove=False)[source]#

build a modo from a yaml or json file

Parameters:
  • config_path (pathlib.Path)

  • object_path (str)

  • endpoint (Optional[pydantic.HttpUrl])

  • s3_kwargs (Optional[dict])

  • services (Optional[dict[str, pydantic.HttpUrl]])

  • no_remove (bool)

Return type:

MODO