modos.genomics.cram#

Utilities to interact with genomic intervals in CRAM files.

Classes#

Region

Genomic region consisting of a chromosome (aka reference) name

Functions#

extract_cram_metadata(cram)

Extract metadata from the CRAM file header and

validate_cram_files(cram_path)

Validate CRAM files using pysam.

create_sequence_id(name, sequence_md5)

Helper function to create a unique id from a sequence name and md5 hash

Module Contents#

class modos.genomics.cram.Region[source]#

Genomic region consisting of a chromosome (aka reference) name and a 0-indexed half-open coordinate interval. Note that the end may not be specified, in which it will be set to math.inf.

chrom: str#
start: int#
end: int | float#
__post_init__()[source]#
to_htsget_query()[source]#

Serializes the region into an htsget URL query.

Example

>>> Region(chrom='chr1', start=0, end=100).to_htsget_query()
'referenceName=chr1&start=0&end=100'
to_tuple()[source]#

Return the region as a simple tuple.

Return type:

tuple[str, Optional[int], Optional[int]]

classmethod from_htsget_query(url)[source]#

Instantiate from an htsget URL query

Example

>>> Region.from_htsget_query(
...   "http://localhost/htsget/reads/ex/demo1?format=CRAM&referenceName=chr1&start=0"
... )
Region(chrom='chr1', start=0, end=inf)
Parameters:

url (str)

classmethod from_ucsc(ucsc)[source]#

Instantiate from a UCSC-formatted region string.

Example

>>> Region.from_ucsc('chr-1ba:10-320')
Region(chrom='chr-1ba', start=10, end=320)
>>> Region.from_ucsc('chr1:-320')
Region(chrom='chr1', start=0, end=320)
>>> Region.from_ucsc('chr1:10-')
Region(chrom='chr1', start=10, end=inf)
>>> Region.from_ucsc('chr1:10')
Region(chrom='chr1', start=10, end=inf)

Note

For more information about the UCSC coordinate system, see: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms

Parameters:

ucsc (str)

Return type:

Region

classmethod from_pysam(record)[source]#
Parameters:

record (pysam.VariantRecord | pysam.AlignedSegment)

Return type:

Region

overlaps(other)[source]#

Checks if other in self. This check if any portion of other overlaps with self.

Parameters:

other (Region)

Return type:

bool

contains(other)[source]#

Checks if other is fully contained in self.

Parameters:

other (Region)

Return type:

bool

modos.genomics.cram.extract_cram_metadata(cram)[source]#

Extract metadata from the CRAM file header and convert specific attributes according to the modo schema.

Parameters:

cram (pysam.AlignmentFile)

Return type:

List

modos.genomics.cram.validate_cram_files(cram_path)[source]#

Validate CRAM files using pysam. Checks if the file is sorted and has an index.

Parameters:

cram_path (str)

modos.genomics.cram.create_sequence_id(name, sequence_md5)[source]#

Helper function to create a unique id from a sequence name and md5 hash

Parameters:
  • name (str)

  • sequence_md5 (str)

Return type:

str