gimie.extractors package¶
Submodules¶
gimie.extractors.abstract module¶
Abstract for Git repository extractors.
- class gimie.extractors.abstract.Extractor(url: str, base_url: str | None = None, local_path: str | None = None)[source]¶
Bases:
ABC
Extractor is an Abstract Base Class. It is only meant to define a standard interface for all git repository extractors.
Subclasses for different git providers must implement extract() and list_files() methods.
- abstract extract() Repository [source]¶
Extract metadata from the git provider into a Repository object.
gimie.extractors.git module¶
Extractor which uses a locally available (usually cloned) repository.
- class gimie.extractors.git.GitExtractor(url: str, base_url: str | None = None, local_path: str | None = None, _cloned: bool = False)[source]¶
Bases:
Extractor
This class is responsible for extracting metadata from a git repository.
- Parameters:
- repository¶
The repository we are extracting metadata from.
- Type:
- extract() Repository [source]¶
Extract metadata from the git provider into a Repository object.
- list_files() List[LocalResource] [source]¶
List all files in the repository HEAD.
gimie.extractors.github module¶
- class gimie.extractors.github.GithubExtractor(url: str, base_url: str | None = None, local_path: str | None = None, token: str | None = None)[source]¶
Bases:
Extractor
Extractor for GitHub repositories. Uses the GitHub GraphQL API to extract metadata into linked data. url: str
The url of the git repository.
- base_url: Optional[str]
The base url of the git remote.
- extract() Repository [source]¶
Extract metadata from target GitHub repository.
- list_files() List[RemoteResource] [source]¶
takes the root repository folder and returns the list of files present
- gimie.extractors.github.query_contributors(url: str, headers: Dict[str, str]) List[Dict[str, Any]] [source]¶
Queries the list of contributors of target repository using GitHub’s REST and GraphQL APIs. Returns a list of GraphQL User nodes. NOTE: This is a workaround for the lack of a contributors field in the GraphQL API.
gimie.extractors.gitlab module¶
- class gimie.extractors.gitlab.GitlabExtractor(url: str, base_url: str | None = None, local_path: str | None = None, token: str | None = None)[source]¶
Bases:
Extractor
Extractor for Gitlab repositories. Uses the Gitlab GraphQL API to extract metadata into linked data. url: str
The url of the git repository.
- base_url: Optional[str]
The base url of the git remote.
- extract() Repository [source]¶
Extract metadata from target Gitlab repository.
- list_files() List[RemoteResource] [source]¶
takes the root repository folder and returns the list of files present
Module contents¶
Git providers from which metadata can be extracted by gimie.
- gimie.extractors.get_extractor(url: str, source: str, base_url: str | None = None, local_path: str | None = None) Extractor [source]¶
Instantiate the correct extractor for a given source.
- Parameters:
URL – Where the repository metadata is extracted from.
source – The source of the repository (git, gitlab, github, …).
base_url – The base URL of the git remote.
local_path – If applicable, the path to the directory where the repository is located.
Examples
>>> extractor = get_extractor( ... "https://github.com/sdsc-ordes/gimie", ... "github" ... )
- gimie.extractors.infer_git_provider(url: str) str [source]¶
Given a git repository URL, return the corresponding git provider. Local path or unsupported git providers will return “git”.
Examples
>>> infer_git_provider("https://gitlab.com/foo/bar") 'gitlab' >>> infer_git_provider("/foo/bar") 'git' >>> infer_git_provider("https://codeberg.org/dnkl/foot") 'git'