Working with remote objects#
Remote storage can be key to share and collaborate on multiomics data. MODOS
integrates with S3 object storage and htsget to allow remote storage, access and real-time secure streaming of genomic data.
Most of the MODOS-api
’s functionalities work with remotely stored objects in the same way as with local objects. The user only as to specify the s3_endpoint
of the remote object store.
List remotely available MODO’s#
Listing all available MODOs
at a specific S3 endpoint (in this tutorial we will use http://localhost as example) will show MODOs
in all buckets at that endpoint:
import modos.remote as remo
# Show all remote modos
remo.list_remote_items("http://localhost")
# ['modos-demo/GIAB', 'modos-demo/ex']
Show metadata of a remote MODO#
For all or a specific MODO
metadata can directly be displayed:
import modos.remote as remo
# Get metadata of all MODOs at endpoint "http://localhost"
remo.get_metadata_from_remote("http://localhost")
# Get metadata of MODO with id ex
remo.get_metadata_from_remote("http://localhost", modo_id = "ex")
Find a specific MODO and get it’s S3 path#
There are different options to query a specific MODO
and the bucket name to load it from - fuzzy search or exact string matching:
import modos.remote as remo
# Query all MODOs with sequence similar to "ex"
remo.get_s3_path("http://localhost", query="ex")
# [{'http://localhost/s3/modos-demo/ex': {'s3_endpoint': 'http://localhost/s3', 'modo_path': 'modos-demo/ex'}}]
# Query all MODOs exactly matching "ex"
remo.get_s3_path("http://localhost", query="ex", exact_match = True)
# []
Intiantiate a remote MODO locally#
Remotely stored MODOs
can be intiantiated by specifiying their remote endpoint and then and worked with as if they were stored locally.
from modos.api import MODO
# Load MODO from remote storage
modo=MODO(path= 's3://modos-demo/ex', endpoint = 'http://localhost')
# All operations can be applied as if locally
modo.metadata
# {'ex': {'@type': 'MODO', 'creation_date': '2024-02-19T00:00:00', 'description': 'Dummy modo for tests.', 'has_assay': ..}}
# Interact with remotly stored MODO
modos --endpoint http://localhost show s3://modos-demo/ex
# ex:
# '@type': MODO
# creation_date: '2024-02-19T00:00:00'
# description: Dummy modo for tests.
# has_assay:
Warning
The bucket name and the endpoint url are specified separatly. The bucket name is part of the object_path
and needs to be included in the s3 path, followed by the MODO
’s name (e.g. s3://bucket_name/modo_name
), while the endpoint url needs to be specified separately. Only paths that follow the s3 scheme will be considered as remote independent of --endpoint
being specified or not.
Note
To avoid repetition the endpoint can also be read from the MODOS_ENDPOINT
environment variable. The syntax then follows the same as for local objects, except that the object_path
needs to be provided as s3 scheme:
export MODOS_ENDPOINT='http://localhost'
modos create s3://bucket/object1
modos show s3://bucket/object1
modos delete s3://bucket/object1
Generate and modify a MODO at a remote object store#
A MODO
can be generated from scratch or from file in the same way as locally, by specifying the remote endpoint’s url or MODOS_ENDPOINT
:
from modos.api import MODO
from pathlib import Path
# yaml file with MODO specifications
config_ex = Path("path/to/ex.yaml")
# Create a modo remotely
modo = MODO.from_file(config_ex, "s3://modos-demo/ex", endpoint= "http://localhost")
# Create a modo from file remotely
modos create --endpoint "http://localhost" --from-file "path/to/ex.yaml" s3://modos-demo/ex3
Note
Similar to MODO
creation, any other modifying functionality of the modos-api
, (e.g. modos add
, modos remove
or MODO.add_element()
, MODO.remove_element()
) can be performed on remotely stored objects by specifying the endpoint and object path as s3 scheme + bucket name as path.