modos.genomics.formats#

Classes#

Region

Genomic region consisting of a chromosome (aka reference) name

GenomicFileSuffix

Enumeration of all supported genomic file suffixes.

Functions#

read_pysam(path[, region])

Automatically instantiate a pysam file object from input path and passes any additional kwarg to it.

Module Contents#

class modos.genomics.formats.Region[source]#

Genomic region consisting of a chromosome (aka reference) name and a 0-indexed half-open coordinate interval. Note that the end may not be specified, in which it will be set to math.inf.

chrom: str#
start: int#
end: int | float#
__post_init__()[source]#
to_htsget_query()[source]#

Serializes the region into an htsget URL query.

Example

>>> Region(chrom='chr1', start=0, end=100).to_htsget_query()
'referenceName=chr1&start=0&end=100'
to_tuple()[source]#

Return the region as a simple tuple.

Return type:

tuple[str, Optional[int], Optional[int]]

classmethod from_htsget_query(url)[source]#

Instantiate from an htsget URL query

Example

>>> Region.from_htsget_query(
...   "http://localhost/htsget/reads/ex/demo1?format=CRAM&referenceName=chr1&start=0"
... )
Region(chrom='chr1', start=0, end=inf)
Parameters:

url (str)

classmethod from_ucsc(ucsc)[source]#

Instantiate from a UCSC-formatted region string.

Example

>>> Region.from_ucsc('chr-1ba:10-320')
Region(chrom='chr-1ba', start=10, end=320)
>>> Region.from_ucsc('chr1:-320')
Region(chrom='chr1', start=0, end=320)
>>> Region.from_ucsc('chr1:10-')
Region(chrom='chr1', start=10, end=inf)
>>> Region.from_ucsc('chr1:10')
Region(chrom='chr1', start=10, end=inf)

Note

For more information about the UCSC coordinate system, see: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms

Parameters:

ucsc (str)

Return type:

Region

classmethod from_pysam(record)[source]#
Parameters:

record (pysam.VariantRecord | pysam.AlignedSegment)

Return type:

Region

overlaps(other)[source]#

Checks if other in self. This check if any portion of other overlaps with self.

Parameters:

other (Region)

Return type:

bool

contains(other)[source]#

Checks if other is fully contained in self.

Parameters:

other (Region)

Return type:

bool

class modos.genomics.formats.GenomicFileSuffix[source]#

Bases: tuple, enum.Enum

Enumeration of all supported genomic file suffixes.

CRAM = ('.cram',)[source]#
BAM = ('.bam',)[source]#
SAM = ('.sam',)[source]#
VCF = ('.vcf', '.vcf.gz')[source]#
BCF = ('.bcf',)[source]#
FASTA = ('.fasta', '.fa')[source]#
FASTQ = ('.fastq', '.fq')[source]#
classmethod from_path(path)[source]#
Parameters:

path (pathlib.Path)

Return type:

GenomicFileSuffix

get_index_suffix()[source]#

Return the supported index suffix related to a genomic filetype

Return type:

str

to_htsget_endpoint()[source]#

Return the htsget endpoint for a genomic file type

Return type:

str

modos.genomics.formats.read_pysam(path, region=None, **kwargs)[source]#

Automatically instantiate a pysam file object from input path and passes any additional kwarg to it.

Parameters:
Return type:

Iterator[pysam.AlignedSegment | pysam.VariantRecord]