modos.api#
Classes#
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Enumeration of all element types. |
|
Enumeration of element types exposed to the user. |
|
Enumeration of all supported genomic file suffixes. |
|
Connection to an htsget resource. |
|
Genomic region consisting of a chromosome (aka reference) name |
|
Handle modos server endpoints. |
|
Multi-Omics Digital Object |
Functions#
|
Convert a attribute dictionary to an RDF graph of metadata. |
|
Add input metadata dictionary to an existing zarr group. |
|
Recursively list all zarr groups and arrays |
|
|
|
|
|
Set the data_path attribute, if it is not specified to the modo root. |
|
Add element to the hasPart attribute of a parent zarr group |
|
update the id of the has_part property of an element to use the full id including its type |
|
Automatically instantiate a pysam file object from input path and passes any additional kwarg to it. |
|
Extract metadata from files associated to a model instance |
|
Load model specification from file into a list of dictionaries. Model types must be specified as @type |
|
Check if a path is an S3 path |
Module Contents#
- modos.api.attrs_to_graph(meta, uri_prefix)[source]#
Convert a attribute dictionary to an RDF graph of metadata.
- modos.api.add_metadata_group(parent_group, metadata)[source]#
Add input metadata dictionary to an existing zarr group.
- Parameters:
parent_group (zarr.hierarchy.Group)
metadata (dict)
- Return type:
None
- modos.api.list_zarr_items(group)[source]#
Recursively list all zarr groups and arrays
- Parameters:
group (zarr.hierarchy.Group)
- Return type:
- class modos.api.LocalStorage(path)[source]#
Bases:
Storage
Helper class that provides a standard way to create an ABC using inheritance.
- Parameters:
path (pathlib.Path)
- property zarr: zarr.hierarchy.Group#
- Return type:
- property path: pathlib.Path#
- Return type:
- exists(target)[source]#
- Parameters:
target (pathlib.Path)
- Return type:
- list(target=None)[source]#
- Parameters:
target (Optional[pathlib.Path])
- remove(target)[source]#
- Parameters:
target (pathlib.Path)
- put(source, target)[source]#
- Parameters:
source (pathlib.Path)
target (pathlib.Path)
- class modos.api.S3Storage(path, s3_endpoint, s3_kwargs=None)[source]#
Bases:
Storage
Helper class that provides a standard way to create an ABC using inheritance.
- property path: pathlib.Path#
- Return type:
- property zarr: zarr.hierarchy.Group#
- Return type:
- exists(target=ZARR_ROOT)[source]#
- Parameters:
target (pathlib.Path)
- Return type:
- list(target=None)[source]#
- Parameters:
target (Optional[pathlib.Path])
- Return type:
Generator[pathlib.Path, None, None]
- remove(target)[source]#
- Parameters:
target (pathlib.Path)
- put(source, target)[source]#
- Parameters:
source (pathlib.Path)
target (pathlib.Path)
- modos.api.dict_to_instance(element)[source]#
- Parameters:
element (Mapping[str, Any])
- Return type:
Any
- class modos.api.ElementType[source]#
-
Enumeration of all element types.
- SAMPLE = 'sample'#
- ASSAY = 'assay'#
- DATA_ENTITY = 'data'#
- REFERENCE_GENOME = 'reference'#
- REFERENCE_SEQUENCE = 'sequence'#
- modos.api.set_data_path(element, source_file=None)[source]#
Set the data_path attribute, if it is not specified to the modo root.
- Parameters:
element (dict)
source_file (Optional[Union[pathlib.Path, str]])
- Return type:
- modos.api.set_haspart_relationship(child_class, child_path, parent_group)[source]#
Add element to the hasPart attribute of a parent zarr group
- Parameters:
child_class (str)
child_path (str)
parent_group (zarr.hierarchy.Group)
- class modos.api.UserElementType[source]#
-
Enumeration of element types exposed to the user.
- SAMPLE = 'sample'#
- ASSAY = 'assay'#
- DATA_ENTITY = 'data'#
- REFERENCE_GENOME = 'reference'#
- modos.api.update_haspart_id(element)[source]#
update the id of the has_part property of an element to use the full id including its type
- Parameters:
element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceGenome | modos_schema.datamodel.MODO)
- class modos.api.GenomicFileSuffix[source]#
-
Enumeration of all supported genomic file suffixes.
- CRAM = ('.cram',)#
- BAM = ('.bam',)#
- SAM = ('.sam',)#
- VCF = ('.vcf', '.vcf.gz')#
- BCF = ('.bcf',)#
- FASTA = ('.fasta', '.fa')#
- FASTQ = ('.fastq', '.fq')#
- classmethod from_path(path)[source]#
- Parameters:
path (pathlib.Path)
- Return type:
- modos.api.read_pysam(path, region=None, **kwargs)[source]#
Automatically instantiate a pysam file object from input path and passes any additional kwarg to it.
- Parameters:
path (pathlib.Path)
region (Optional[modos.genomics.region.Region])
- Return type:
Iterator[pysam.AlignedSegment | pysam.VariantRecord]
- class modos.api.HtsgetConnection[source]#
Connection to an htsget resource. It allows to open a stream to the resource and lazily fetch data from it.
- host: pydantic.HttpUrl#
- path: pathlib.Path#
- region: modos.genomics.region.Region | None#
- to_file(path)[source]#
Save all data from the stream to a file.
- Parameters:
path (pathlib.Path)
- class modos.api.Region[source]#
Genomic region consisting of a chromosome (aka reference) name and a 0-indexed half-open coordinate interval. Note that the end may not be specified, in which it will be set to math.inf.
- to_htsget_query()[source]#
Serializes the region into an htsget URL query.
Example
>>> Region(chrom='chr1', start=0, end=100).to_htsget_query() 'referenceName=chr1&start=0&end=100'
- classmethod from_htsget_query(url)[source]#
Instantiate from an htsget URL query
Example
>>> Region.from_htsget_query( ... "http://localhost/htsget/reads/ex/demo1?format=CRAM&referenceName=chr1&start=0" ... ) Region(chrom='chr1', start=0, end=inf)
- Parameters:
url (str)
- classmethod from_ucsc(ucsc)[source]#
Instantiate from a UCSC-formatted region string.
Example
>>> Region.from_ucsc('chr-1ba:10-320') Region(chrom='chr-1ba', start=10, end=320) >>> Region.from_ucsc('chr1:-320') Region(chrom='chr1', start=0, end=320) >>> Region.from_ucsc('chr1:10-') Region(chrom='chr1', start=10, end=inf) >>> Region.from_ucsc('chr1:10') Region(chrom='chr1', start=10, end=inf)
Note
For more information about the UCSC coordinate system, see: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
- classmethod from_pysam(record)[source]#
- Parameters:
record (pysam.VariantRecord | pysam.AlignedSegment)
- Return type:
- modos.api.extract_metadata(instance, base_path)[source]#
Extract metadata from files associated to a model instance
- Parameters:
base_path (pathlib.Path)
- Return type:
List
- modos.api.parse_attributes(path)[source]#
Load model specification from file into a list of dictionaries. Model types must be specified as @type
- Parameters:
path (pathlib.Path)
- Return type:
List[dict]
- class modos.api.EndpointManager[source]#
Handle modos server endpoints. If a modos server url is provided, it is used to detect available service urls. Alternatively, service urls can be provided explicitely if no modos server is available.
- Parameters:
modos – URL to the modos server.
services – Mapping of services to their urls.
Examples
>>> ex = EndpointManager(modos="http://modos.example.org") >>> ex.list() { 's3: Url('http://s3.example.org/'), 'htsget': Url('http://htsget.example.org/') } >>> ex.htsget Url('http://htsget.example.org/') >>> ex = EndpointManager(services={"s3": "http://s3.example.org"}) >>> ex.s3 Url('http://s3.example.org/')
- class modos.api.MODO(path, id=None, name=None, description=None, creation_date=date.today(), last_update_date=date.today(), has_assay=[], source_uri=None, endpoint=None, s3_kwargs=None, services=None)[source]#
Multi-Omics Digital Object A digital archive containing several multi-omics data and records connected by zarr-backed metadata.
- Parameters:
path (Union[pathlib.Path, str]) – Path to the archive directory.
id (Optional[str]) – MODO identifier. Defaults to the directory name.
name (Optional[str]) – Human-readable name.
description (Optional[str]) – Human readable description.
creation_date (datetime.date) – When the MODO was created.
last_update_date (datetime.date) – When the MODO was last updated.
has_assay (List) – Existing assay identifiers to attach to MODO.
source_uri (Optional[str]) – URI of the source data.
endpoint (Optional[pydantic.HttpUrl]) – URL to the modos server.
s3_kwargs (Optional[dict[str, Any]]) – Keyword arguments for the S3 storage.
services (Optional[dict[str, pydantic.HttpUrl]]) – Optional dictionary of service endpoints.
- endpoint#
Server endpoint manager.
- Type:
Examples
>>> demo = MODO("data/ex")
# List identifiers of samples in the archive >>> demo.list_samples() [‘sample/sample1’]
# List files in the archive >>> files = sorted(demo.list_files()) >>> assert Path(‘data/ex/demo1.cram’) in files >>> assert Path(‘data/ex/reference1.fa’) in files
- property zarr: zarr.hierarchy.Group[source]#
- Return type:
- property path: pathlib.Path[source]#
- Return type:
- knowledge_graph(uri_prefix=None)[source]#
Return an RDF graph of the metadata. All identifiers are converted to valid URIs if needed.
- Parameters:
uri_prefix (Optional[str])
- Return type:
rdflib.Graph
- list_files()[source]#
Lists files in the archive recursively (except for the zarr file).
- Return type:
List[pathlib.Path]
- list_arrays(element=None)[source]#
Views arrays in the archive recursively.
- Parameters:
element (Optional[str]) – Element, or group of elements (e.g. data or data/element_id) to show. If not provided, shows the metadata of the entire MODO.
- Return type:
zarr.hierarchy.TreeViewer
- update_date(date=date.today())[source]#
update last_update_date attribute
- Parameters:
date (MODO.update_date.date)
- remove_element(element_id)[source]#
Remove an element from the archive, along with any files directly attached to it and links from other elements to it.
- Parameters:
element_id (str)
- add_element(element, source_file=None, part_of=None)[source]#
Add an element to the archive. If a data file is provided, it will be added to the archive. If the element is part of another element, the parent metadata will be updated.
- Parameters:
element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceGenome) – Element to add to the archive.
source_file (Optional[pathlib.Path]) – File to associate with the element.
part_of (Optional[str]) – Id of the parent element. It must be scoped to the type. For example “sample/foo”.
- _add_any_element(element, source_file=None, part_of=None)[source]#
Add an element of any type to the storage.
- Parameters:
element (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.ReferenceSequence | modos_schema.datamodel.ReferenceGenome)
source_file (Optional[pathlib.Path])
part_of (Optional[str])
- update_element(element_id, new)[source]#
Update element metadata in place by adding new values from model object.
- Parameters:
element_id (str) – Full id path in the zarr store.
new (modos_schema.datamodel.DataEntity | modos_schema.datamodel.Sample | modos_schema.datamodel.Assay | modos_schema.datamodel.MODO) – Element containing the enriched metadata.
- stream_genomics(file_path, region=None, reference_filename=None)[source]#
Slices both local and remote CRAM, VCF (.vcf.gz), and BCF files returning an iterator over records.