gimie.parsers package

Subpackages

Submodules

gimie.parsers.abstract module

class gimie.parsers.abstract.Parser(subject: str)[source]

Bases: ABC

Parser is an Abstract Base Class. It is only meant to define a standard interface for all parsers.

All subclasses must implement parse(). A parser parses bytes data into a set of predicate-object tuples.

Parameters:

subject – The subject of a triple (subject - predicate - object) to be used for writing parsed properties to.

abstract parse(data: bytes) Graph[source]

Extract rdf graph from a source.

parse_all(docs: Iterable[bytes]) Graph[source]

Parse multiple sources and return the union of triples.

gimie.parsers.cff module

class gimie.parsers.cff.CffParser(subject: str)[source]

Bases: Parser

Parse DOI and authors from CITATION.cff.

parse(data: bytes) Graph[source]

Extracts DOIs and list of authors from a CFF file and returns a graph with triples <subject> <schema:citation> <doi> and a number of author objects with <schema:name> and <md4i:orcid> values. If no DOIs are found, they will not be included in the graph. If no authors are found, they will not be included in the graph. If neither authors nor DOIs are found, an empty graph is returned.

gimie.parsers.cff.doi_to_url(doi: str) str[source]

Formats a doi to an https URL to doi.org.

Parameters:

doi – doi where the scheme (e.g. https://) and hostname (e.g. doi.org) may be missing.

Returns:

doi formatted as a valid url. Base url is set to https://doi.org when missing.

Return type:

str

Examples

>>> doi_to_url("10.0000/example.abcd")
'https://doi.org/10.0000/example.abcd'
>>> doi_to_url("doi.org/10.0000/example.abcd")
'https://doi.org/10.0000/example.abcd'
>>> doi_to_url("https://doi.org/10.0000/example.abcd")
'https://doi.org/10.0000/example.abcd'
gimie.parsers.cff.get_cff_authors(data: bytes) List[dict[str, str]] | None[source]

Given a CFF file, returns a list of dictionaries containing orcid, affiliation, first and last names of authors, if any.

Parameters:

data – The cff file body as bytes.

Returns:

orcid, names strings of authors

Return type:

list(dict), optional

gimie.parsers.cff.get_cff_doi(data: bytes) list[str] | None[source]

Given a CFF file, returns a list of DOIs, if any.

Parameters:

data – The cff file body as bytes.

Returns:

DOIs formatted as valid URLs

Return type:

list of str, optional

Examples

>>> get_cff_doi(bytes("identifiers:\n    - type: doi\n      value: 10.5281/zenodo.1234\n    - type: doi\n      value: 10.5281/zenodo.5678", encoding="utf8"))
['https://doi.org/10.5281/zenodo.1234', 'https://doi.org/10.5281/zenodo.5678']
>>> get_cff_doi(bytes("identifiers:\n    - type: doi\n      value: 10.5281/zenodo.9012", encoding="utf8"))
['https://doi.org/10.5281/zenodo.9012']
>>> get_cff_doi(bytes("abc: def", encoding="utf8"))

Module contents

Files which can be parsed by gimie.

class gimie.parsers.ParserInfo(default, type)[source]

Bases: NamedTuple

default: bool

Alias for field number 0

type: Type[Parser]

Alias for field number 1

gimie.parsers.get_parser(name: str) Type[Parser][source]

Get a parser by name.

gimie.parsers.list_default_parsers() Set[str][source]

List the names of all default parsers.

gimie.parsers.list_parsers() Set[str][source]

List the names of all parsers.

gimie.parsers.parse_files(subject: str, files: Iterable[Resource], parsers: Set[str] | None = None) Graph[source]

For each input file, select appropriate parser among a collection and parse its contents. Return the union of all parsed properties in the form of triples. If no parser is found for a given file, skip it.

Parameters:
  • subject – The subject URI of the repository.

  • files – A collection of file-like objects.

  • parsers – A set of parser names. If None, use the default collection.

gimie.parsers.select_parser(path: Path, parsers: Set[str] | None = None) Type[Parser] | None[source]

Select the appropriate parser from a collection based on a file path. If no parser is found, return None.

Parameters:
  • path – The path of the file to parse.

  • parsers – A set of parser names. If None, use the default collection.