gimie package

Subpackages

Submodules

gimie.cli module

Command line interface to the gimie package.

class gimie.cli.RDFFormatChoice(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

jsonld = 'json-ld'
nt = 'nt'
ttl = 'ttl'
gimie.cli.advice(url: str)[source]

Show a metadata completion report for a Git repository at the target URL.

NOTE: Not implemented yet

gimie.cli.callback(version: bool | None = <typer.models.OptionInfo object>)[source]

gimie digs Git repositories for metadata.

gimie.cli.data(url: str, format: ~gimie.cli.RDFFormatChoice = <typer.models.OptionInfo object>, base_url: str | None = <typer.models.OptionInfo object>, include_parser: ~typing.List[str] | None = <typer.models.OptionInfo object>, exclude_parser: ~typing.List[str] | None = <typer.models.OptionInfo object>, version: bool | None = <typer.models.OptionInfo object>)[source]

Extract linked metadata from a Git repository at the target URL.

The output is sent to stdout, and turtle is used as the default serialization format.

gimie.cli.parsers(verbose: bool = <typer.models.OptionInfo object>)[source]

List available parsers, specifying which are default. If –verbose is used, show parser description.

gimie.cli.version_callback(value: bool)[source]

gimie.io module

Standard input interfaces to local or remote resources for gimie.

class gimie.io.IterStream(iterator: Iterator[bytes])[source]

Bases: RawIOBase

Wraps an iterator under a like a file-like interface. Empty elements in the iterator are ignored.

Parameters:

iterator – An iterator yielding bytes.

Examples

>>> stream = IterStream(iter([b"Hello ", b"", b"World"]))
>>> stream.read()
b'Hello World'
readable()[source]

Return whether object was opened for reading.

If False, read() will raise OSError.

readinto(b)[source]
class gimie.io.LocalResource(path: str | PathLike)[source]

Bases: Resource

Providing read-only access to local data via a file-like interface.

Examples

>>> resource = LocalResource("README.md")
open() RawIOBase[source]
class gimie.io.RemoteResource(path: str, url: str, headers: dict | None = None)[source]

Bases: Resource

Provides read-only access to remote data via a file-like interface.

Parameters:
  • url – The URL where the resource. can be downladed from.

  • headers – Optional headers to pass to the request.

Examples

>>> url = "https://raw.githubusercontent.com/sdsc-ordes/gimie/main/README.md"
>>> content = RemoteResource("README.md", url).open().read()
>>> assert isinstance(content, bytes)
open() RawIOBase[source]
class gimie.io.Resource[source]

Bases: object

Abstract class for read-only access to local or remote resources via a file-like interface.

Parameters:

path (pathlib.Path) – The local relative path to the resource.

open() RawIOBase[source]
path: Path

gimie.models module

Data models to represent nodes in the graph generated by gimie.

class gimie.models.Organization(_id: str, name: str, legal_name: str | None = None, email: List[str] | None = None, description: str | None = None, logo: str | None = None)[source]

Bases: object

See http//schema.org/Organization

description: str | None = None
email: List[str] | None = None
legal_name: str | None = None
name: str
class gimie.models.OrganizationSchema(*args, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None, flattened=False, lazy=False, _all_objects=None, _visited=None, _top_level=True)[source]

Bases: JsonLDSchema

class Meta[source]

Bases: object

model

alias of Organization

rdf_type = rdflib.term.URIRef('http://schema.org/Organization')
opts: SchemaOpts = <calamus.schema.JsonLDSchemaOpts object>
class gimie.models.Person(_id: str, identifier: str, name: str | None = None, email: str | None = None, affiliations: List[Organization] | None = None)[source]

Bases: object

See http//schema.org/Person

affiliations: List[Organization] | None = None
email: str | None = None
identifier: str
name: str | None = None
class gimie.models.PersonSchema(*args, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None, flattened=False, lazy=False, _all_objects=None, _visited=None, _top_level=True)[source]

Bases: JsonLDSchema

class Meta[source]

Bases: object

model

alias of Person

rdf_type = rdflib.term.URIRef('http://schema.org/Person')
opts: SchemaOpts = <calamus.schema.JsonLDSchemaOpts object>
class gimie.models.Release(tag: str, date: <module 'datetime' from '/usr/lib/python3.12/datetime.py'>, commit_hash: str)[source]

Bases: object

This class represents a release of a repository.

Parameters:
  • tag (str) – The tag of the release.

  • date (datetime.datetime) – The date of the release.

  • commit_hash (str) – The commit hash of the release.

commit_hash: str
date: <module 'datetime' from '/usr/lib/python3.12/datetime.py'>
tag: str
class gimie.models.Repository(url: str, name: str, authors: ~typing.List[~gimie.models.Organization | ~gimie.models.Person] | None = None, contributors: ~typing.List[~gimie.models.Person] | None = None, date_created: <module 'datetime' from '/usr/lib/python3.12/datetime.py'> | None = None, date_modified: <module 'datetime' from '/usr/lib/python3.12/datetime.py'> | None = None, date_published: <module 'datetime' from '/usr/lib/python3.12/datetime.py'> | None = None, description: str | None = None, download_url: str | None = None, identifier: str | None = None, keywords: ~typing.List[str] | None = None, licenses: ~typing.List[str] | None = None, parent_repository: str | None = None, prog_langs: ~typing.List[str] | None = None, version: str | None = None)[source]

Bases: object

This class represents a git repository. It does not contain any information about the content of the repository. See https://schema.org/SoftwareSourceCode

authors: List[Organization | Person] | None = None
contributors: List[Person] | None = None
date_created: <module 'datetime' from '/usr/lib/python3.12/datetime.py'> | None = None
date_modified: <module 'datetime' from '/usr/lib/python3.12/datetime.py'> | None = None
date_published: <module 'datetime' from '/usr/lib/python3.12/datetime.py'> | None = None
description: str | None = None
download_url: str | None = None
identifier: str | None = None
jsonld() str[source]

Alias for jsonld serialization.

keywords: List[str] | None = None
licenses: List[str] | None = None
name: str
parent_repository: str | None = None
prog_langs: List[str] | None = None
serialize(format: str = 'ttl', **kwargs) str[source]

Serialize the RDF graph representing the instance.

to_graph() Graph[source]

Convert repository to RDF graph.

url: str
version: str | None = None
class gimie.models.RepositorySchema(*args, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None, flattened=False, lazy=False, _all_objects=None, _visited=None, _top_level=True)[source]

Bases: JsonLDSchema

This defines the schema used for json-ld serialization.

class Meta[source]

Bases: object

add_value_types = False
model

alias of Repository

rdf_type = rdflib.term.URIRef('http://schema.org/SoftwareSourceCode')
opts: SchemaOpts = <calamus.schema.JsonLDSchemaOpts object>

gimie.project module

Orchestration of multiple extractors for a given project. This is the main entry point for end-to-end analysis.

class gimie.project.Project(path: str, base_url: str | None = None, git_provider: str | None = None, parser_names: Iterable[str] | None = None)[source]

Bases: object

A class to represent a project’s git repository.

Parameters:
  • path – The full path (URL) of the repository.

  • base_url – The base URL of the git remote. Can be used to specify delimitation between base URL and project name.

  • git_provider – The name of the git provider to extract metadata from. (‘git’, ‘github’, ‘gitlab’)

  • parser_names – Names of file parsers to use. (‘license’). If None, default parsers are used (see gimie.parsers.PARSERS).

Examples

>>> proj = Project("https://github.com/sdsc-ordes/gimie")
>>> assert isinstance(proj.extract(), Graph)
extract() Graph[source]

Extract repository metadata from git provider to RDF graph and enrich with metadata parsed from file contents.

gimie.project.split_git_url(url: str) Tuple[str, str][source]

Split a git URL into base URL and project path.

Examples

>>> split_git_url("https://gitlab.com/foo/bar")
('https://gitlab.com', 'foo/bar')

Module contents