pyfuzon
Python bindings for the fuzon library.
Installation
Pyfuzon is distributed on PyPI and can be installed with:
pip install pyfuzon
Usage
TermMatcher
is the central object in pyfuzon. It can be built from a list of RDF files, either locally or from URLs. It exposes methods to rank, score or search top N terms for their similarity with input text.
from pyfuzon.matcher import TermMatcher
matcher = TermMatcher.from_files(["https://example.org/onto1.ttl", "/data/onto2.ttl"])
matcher.terms #accesses the list of terms loaded from input files
matcher.score("query") # returns the match score of each term for the input query.
matcher.rank("query") # returns the list of terms sorted by similarity with the query.
matcher.top("query", 5) # shows top 5 most similar results (sorted).
Fuzon's caching mechanism is also available from python via the pyfuzon.cache
.
from pathlib import Path
from pyfuzon import cache
sources = ["data/onto1.ttl", "data/onto2.ttl"]
# This initializes the fuzon cache dir, `~/.cache/fuzon` on linux.
Path(cache.get_cache_path(sources)).parent.mkdir(parents=True, exist_ok=True)
There are two way to use caching.
By source, where each ontology is cached/loaded indepdently:
[!TIP] This is preferred if mix and matching many ontologies, as this reduces duplication in the caching folder.
# each source cached under a separate key.
cache.cache_by_source(sources)
# multiple entries merged them into a matcher
matcher = cache.load_by_source(sources)
Or by matcher, where multiple ontologies are combined into a single cache entry:
[!TIP] This is preferred if always reusing the same combination(s) of ontologies, as the loading process is faster.
# Generate a single cache key from multiple ontologies
cache_path = cache.get_cache_path(sources)
# Dump the combined matcher to a file
matcher.dump(cache_path)
# Load combined matcher in one go
matcher = TermMatcher.load(cache_path)