gimie.extractors package

Submodules

gimie.extractors.abstract module

Abstract for Git repository extractors.

class gimie.extractors.abstract.Extractor(url: str, base_url: str | None = None, local_path: str | None = None)[source]

Bases: ABC

Extractor is an Abstract Base Class. It is only meant to define a standard interface for all git repository extractors.

Subclasses for different git providers must implement extract() and list_files() methods.

property base: str

Base URL of the remote.

abstract extract() Repository[source]

Extract metadata from the git provider into a Repository object.

abstract list_files() List[Resource][source]

List all files in the repository HEAD.

property path: str

Path to the repository without the base URL.

gimie.extractors.git module

Extractor which uses a locally available (usually cloned) repository.

class gimie.extractors.git.GitExtractor(url: str, base_url: str | None = None, local_path: str | None = None, _cloned: bool = False)[source]

Bases: Extractor

This class is responsible for extracting metadata from a git repository.

Parameters:
  • url (str) – The url of the git repository.

  • base_url (Optional[str]) – The base url of the git remote.

  • local_path (Optional[str]) – The local path where the cloned git repository is located.

uri

The URI to assign the repository in RDF.

Type:

Optional[str]

repository

The repository we are extracting metadata from.

Type:

Repository

base_url: str | None = None
extract() Repository[source]

Extract metadata from the git provider into a Repository object.

list_files() List[LocalResource][source]

List all files in the repository HEAD.

local_path: str | None = None
url: str

gimie.extractors.github module

class gimie.extractors.github.GithubExtractor(url: str, base_url: str | None = None, local_path: str | None = None, token: str | None = None)[source]

Bases: Extractor

Extractor for GitHub repositories. Uses the GitHub GraphQL API to extract metadata into linked data. url: str

The url of the git repository.

base_url: Optional[str]

The base url of the git remote.

base_url: str | None = None
extract() Repository[source]

Extract metadata from target GitHub repository.

list_files() List[RemoteResource][source]

takes the root repository folder and returns the list of files present

local_path: str | None = None
token: str | None = None
url: str
gimie.extractors.github.query_contributors(url: str, headers: Dict[str, str]) List[Dict[str, Any]][source]

Queries the list of contributors of target repository using GitHub’s REST and GraphQL APIs. Returns a list of GraphQL User nodes. NOTE: This is a workaround for the lack of a contributors field in the GraphQL API.

gimie.extractors.gitlab module

class gimie.extractors.gitlab.GitlabExtractor(url: str, base_url: str | None = None, local_path: str | None = None, token: str | None = None)[source]

Bases: Extractor

Extractor for Gitlab repositories. Uses the Gitlab GraphQL API to extract metadata into linked data. url: str

The url of the git repository.

base_url: Optional[str]

The base url of the git remote.

base_url: str | None = None
extract() Repository[source]

Extract metadata from target Gitlab repository.

property graphql_endpoint: str
list_files() List[RemoteResource][source]

takes the root repository folder and returns the list of files present

local_path: str | None = None
property rest_endpoint: str
token: str | None = None
url: str

Module contents

Git providers from which metadata can be extracted by gimie.

gimie.extractors.get_extractor(url: str, source: str, base_url: str | None = None, local_path: str | None = None) Extractor[source]

Instantiate the correct extractor for a given source.

Parameters:
  • URL – Where the repository metadata is extracted from.

  • source – The source of the repository (git, gitlab, github, …).

  • base_url – The base URL of the git remote.

  • local_path – If applicable, the path to the directory where the repository is located.

Examples

>>> extractor = get_extractor(
...     "https://github.com/sdsc-ordes/gimie",
...     "github"
... )
gimie.extractors.infer_git_provider(url: str) str[source]

Given a git repository URL, return the corresponding git provider. Local path or unsupported git providers will return “git”.

Examples

>>> infer_git_provider("https://gitlab.com/foo/bar")
'gitlab'
>>> infer_git_provider("/foo/bar")
'git'
>>> infer_git_provider("https://codeberg.org/dnkl/foot")
'git'