gimie.parsers package¶
Subpackages¶
Submodules¶
gimie.parsers.abstract module¶
- class gimie.parsers.abstract.Parser(subject: str)[source]¶
Bases:
ABC
Parser is an Abstract Base Class. It is only meant to define a standard interface for all parsers.
All subclasses must implement parse(). A parser parses bytes data into a set of predicate-object tuples.
- Parameters:
subject – The subject of a triple (subject - predicate - object) to be used for writing parsed properties to.
gimie.parsers.cff module¶
- class gimie.parsers.cff.CffParser(subject: str)[source]¶
Bases:
Parser
Parse DOI and authors from CITATION.cff.
- parse(data: bytes) Graph [source]¶
Extracts DOIs and list of authors from a CFF file and returns a graph with triples <subject> <schema:citation> <doi> and a number of author objects with <schema:name> and <md4i:orcid> values. If no DOIs are found, they will not be included in the graph. If no authors are found, they will not be included in the graph. If neither authors nor DOIs are found, an empty graph is returned.
- gimie.parsers.cff.doi_to_url(doi: str) str [source]¶
Formats a doi to an https URL to doi.org.
- Parameters:
doi – doi where the scheme (e.g. https://) and hostname (e.g. doi.org) may be missing.
- Returns:
doi formatted as a valid url. Base url is set to https://doi.org when missing.
- Return type:
Examples
>>> doi_to_url("10.0000/example.abcd") 'https://doi.org/10.0000/example.abcd' >>> doi_to_url("doi.org/10.0000/example.abcd") 'https://doi.org/10.0000/example.abcd' >>> doi_to_url("https://doi.org/10.0000/example.abcd") 'https://doi.org/10.0000/example.abcd'
- gimie.parsers.cff.get_cff_authors(data: bytes) List[dict[str, str]] | None [source]¶
Given a CFF file, returns a list of dictionaries containing orcid, affiliation, first and last names of authors, if any.
- gimie.parsers.cff.get_cff_doi(data: bytes) list[str] | None [source]¶
Given a CFF file, returns a list of DOIs, if any.
- Parameters:
data – The cff file body as bytes.
- Returns:
DOIs formatted as valid URLs
- Return type:
Examples
>>> get_cff_doi(bytes("identifiers:\n - type: doi\n value: 10.5281/zenodo.1234\n - type: doi\n value: 10.5281/zenodo.5678", encoding="utf8")) ['https://doi.org/10.5281/zenodo.1234', 'https://doi.org/10.5281/zenodo.5678'] >>> get_cff_doi(bytes("identifiers:\n - type: doi\n value: 10.5281/zenodo.9012", encoding="utf8")) ['https://doi.org/10.5281/zenodo.9012'] >>> get_cff_doi(bytes("abc: def", encoding="utf8"))
Module contents¶
Files which can be parsed by gimie.
- class gimie.parsers.ParserInfo(default, type)[source]¶
Bases:
NamedTuple
- gimie.parsers.parse_files(subject: str, files: Iterable[Resource], parsers: Set[str] | None = None) Graph [source]¶
For each input file, select appropriate parser among a collection and parse its contents. Return the union of all parsed properties in the form of triples. If no parser is found for a given file, skip it.
- Parameters:
subject – The subject URI of the repository.
files – A collection of file-like objects.
parsers – A set of parser names. If None, use the default collection.
- gimie.parsers.select_parser(path: Path, parsers: Set[str] | None = None) Type[Parser] | None [source]¶
Select the appropriate parser from a collection based on a file path. If no parser is found, return None.
- Parameters:
path – The path of the file to parse.
parsers – A set of parser names. If None, use the default collection.