gimie.parsers package

Subpackages

Submodules

gimie.parsers.abstract module

class gimie.parsers.abstract.Parser[source]

Bases: ABC

Parser is an Abstract Base Class. It is only meant to define a standard interface for all parsers.

All subclasses must implement parse(). A parser parses bytes data into a set of predicate-object tuples.

abstract parse(data: bytes) Set[Tuple[URIRef, URIRef | Literal]][source]

Extract predicate-object tuples from a source.

parse_all(docs: Iterable[bytes]) Set[Tuple[URIRef, URIRef | Literal]][source]

Parse multiple sources and return the union of predicate-object tuples.

gimie.parsers.cff module

class gimie.parsers.cff.CffParser[source]

Bases: Parser

Parse DOI from CITATION.cff into schema:citation <doi>.

parse(data: bytes) Set[Tuple[URIRef, URIRef | Literal]][source]

Extracts a DOI link from a CFF file and returns a set with a single tuple <schema:citation> <doi>. If no DOI is found, an empty set is returned.

gimie.parsers.cff.doi_to_url(doi: str) str[source]

Formats a doi to an https URL to doi.org.

Parameters:

doi – doi where the scheme (e.g. https://) and hostname (e.g. doi.org) may be missing.

Returns:

doi formatted as a valid url. Base url is set to https://doi.org when missing.

Return type:

str

Examples

>>> doi_to_url("10.0000/example.abcd")
'https://doi.org/10.0000/example.abcd'
>>> doi_to_url("doi.org/10.0000/example.abcd")
'https://doi.org/10.0000/example.abcd'
>>> doi_to_url("https://doi.org/10.0000/example.abcd")
'https://doi.org/10.0000/example.abcd'
gimie.parsers.cff.get_cff_doi(data: bytes) str | None[source]

Given a CFF file, returns the DOI, if any.

Parameters:

data – The cff file body as bytes.

Returns:

doi formatted as a valid url

Return type:

str, optional

Examples

>>> get_cff_doi(bytes("doi:   10.5281/zenodo.1234", encoding="utf8"))
'https://doi.org/10.5281/zenodo.1234'
>>> get_cff_doi(bytes("abc: def", encoding="utf8"))

Module contents

Files which can be parsed by gimie.

class gimie.parsers.ParserInfo(default, type)[source]

Bases: NamedTuple

default: bool

Alias for field number 0

type: Type[Parser]

Alias for field number 1

gimie.parsers.get_parser(name: str) Type[Parser][source]

Get a parser by name.

gimie.parsers.list_default_parsers() Set[str][source]

List the names of all default parsers.

gimie.parsers.list_parsers() Set[str][source]

List the names of all parsers.

gimie.parsers.parse_files(files: Iterable[Resource], parsers: Set[str] | None = None) Set[Tuple[URIRef, URIRef | Literal]][source]

For each input file, select appropriate parser among a collection and parse its contents. Return the union of all parsed properties. If no parser is found for a given file, skip it.

Parameters:
  • files – A collection of file-like objects.

  • parsers – A set of parser names. If None, use the default collection.

gimie.parsers.select_parser(path: Path, parsers: Set[str] | None = None) Type[Parser] | None[source]

Select the appropriate parser from a collection based on a file path. If no parser is found, return None.

Parameters:
  • path – The path of the file to parse.

  • parsers – A set of parser names. If None, use the default collection.