Working with remote objects#

Remote storage can be key to share and collaborate on multiomics data. MODOS integrates with S3 object storage and htsget to allow remote storage, access and real-time secure streaming of genomic data. Most of the MODOS-api’s functionalities work with remotely stored objects in the same way as with local objects. The user only as to specify the s3_endpoint of the remote object store.

List remotely available MODO’s#

Listing all available MODOs at a specific S3 endpoint (in this tutorial we will use http://localhost as example) will show MODOs in all buckets at that endpoint:

import modos.remote as remo

# Show all remote modos
remo.list_remote_items("http://localhost")
# ['modos-demo/GIAB', 'modos-demo/ex']

Show metadata of a remote MODO#

For all or a specific MODO metadata can directly be displayed:

import modos.remote as remo

# Get metadata of all MODOs at endpoint "http://localhost"
remo.get_metadata_from_remote("http://localhost")

# Get metadata of MODO with id ex
remo.get_metadata_from_remote("http://localhost", modo_id = "ex")

Find a specific MODO and get it’s S3 path#

There are different options to query a specific MODO and the bucket name to load it from - fuzzy search or exact string matching:

import modos.remote as remo

# Query all MODOs with sequence similar to "ex"
remo.get_s3_path("http://localhost", query="ex")
# [{'http://localhost/s3/modos-demo/ex': {'s3_endpoint': 'http://localhost/s3', 'modo_path': 'modos-demo/ex'}}]

# Query all MODOs exactly matching "ex"
remo.get_s3_path("http://localhost", query="ex", exact_match = True)
# []

Intiantiate a remote MODO locally#

Remotely stored MODOs can be intiantiated by specifiying their remote endpoint and then and worked with as if they were stored locally.

from modos.api import MODO

# Load MODO from remote storage
modo=MODO(path= 's3://modos-demo/ex', endpoint = 'http://localhost')

# All operations can be applied as if locally
modo.metadata
# {'ex': {'@type': 'MODO', 'creation_date': '2024-02-19T00:00:00', 'description': 'Dummy modo for tests.', 'has_assay': ..}}
# Interact with remotly stored MODO
modos --endpoint http://localhost show s3://modos-demo/ex
# ex:
#   '@type': MODO
#   creation_date: '2024-02-19T00:00:00'
#   description: Dummy modo for tests.
#   has_assay:

Warning

The bucket name and the endpoint url are specified separatly. The bucket name is part of the object_path and needs to be included in the s3 path, followed by the MODO’s name (e.g. s3://bucket_name/modo_name), while the endpoint url needs to be specified separately. Only paths that follow the s3 scheme will be considered as remote independent of --endpoint being specified or not.

Note

To avoid repetition the endpoint can also be read from the MODOS_ENDPOINT environment variable. The syntax then follows the same as for local objects, except that the object_path needs to be provided as s3 scheme:

export MODOS_ENDPOINT='http://localhost'
modos create s3://bucket/object1
modos show   s3://bucket/object1
modos delete s3://bucket/object1

Generate and modify a MODO at a remote object store#

A MODO can be generated from scratch or from file in the same way as locally, by specifying the remote endpoint’s url or MODOS_ENDPOINT:

from modos.api import MODO
from pathlib import Path

# yaml file with MODO specifications
config_ex = Path("path/to/ex.yaml")

# Create a modo remotely
modo = MODO.from_file(config_ex, "s3://modos-demo/ex", endpoint= "http://localhost")
# Create a modo from file remotely
modos create --endpoint "http://localhost" --from-file "path/to/ex.yaml" s3://modos-demo/ex3

Note

Similar to MODO creation, any other modifying functionality of the modos-api, (e.g. modos add, modos remove or MODO.add_element(), MODO.remove_element()) can be performed on remotely stored objects by specifying the endpoint and object path as s3 scheme + bucket name as path.