Working with remote objects#

Remote storage can be key to share and collaborate on multiomics data. MODOS integrates with S3 object storage and htsget to allow remote storage, access and real-time secure streaming of genomic data. Most of the MODOS-api’s functionalities work with remotely stored objects in the same way as with local objects. The user only as to specify the s3_endpoint of the remote object store.

List remotely available MODO’s#

Listing all available MODOs at a specific S3 endpoint (in this tutorial we will use http://localhost as example) will show MODOs in all buckets at that endpoint:

import modos.remote as remo

# Show all remote modos
remo.list_remote_items("http://localhost")
# ['modos-demo/GIAB', 'modos-demo/ex']

Show metadata of a remote MODO#

For all or a specific MODO metadata can directly be displayed:

import modos.remote as remo

# Get metadata of all MODOs at endpoint "http://localhost"
remo.get_metadata_from_remote("http://localhost")

# Get metadata of MODO with id ex
remo.get_metadata_from_remote("http://localhost", modo_id = "ex")

Find a specific MODO and get it’s S3 path#

There are different options to query a specific MODO and the bucket name to load it from - fuzzy search or exact string matching:

import modos.remote as remo

# Query all MODOs with sequence similar to "ex"
remo.get_s3_path("http://localhost", query="ex")
# [{'http://localhost/s3/modos-demo/ex': {'s3_endpoint': 'http://localhost/s3', 'modo_path': 'modos-demo/ex'}}]

# Query all MODOs exactly matching "ex"
remo.get_s3_path("http://localhost", query="ex", exact_match = True)
# []

Intiantiate a remote MODO locally#

Remotely stored MODOs can be intiantiated by specifiying their remote endpoint and then and worked with as if they were stored locally.

from modos.api import MODO

# Load MODO from remote storage
modo=MODO(path= 'modos-demo/ex', s3_endpoint = 'http://localhost/s3')

# All operations can be applied as if locally
modo.metadata
# {'ex': {'@type': 'MODO', 'creation_date': '2024-02-19T00:00:00', 'description': 'Dummy modo for tests.', 'has_assay': ..}}
# Interact with remotly stored MODO
modos show -s3 "http://localhost/s3" modos-demo/ex
# ex:
#   '@type': MODO
#   creation_date: '2024-02-19T00:00:00'
#   description: Dummy modo for tests.
#   has_assay:

Note

The bucket name and the S3 endpoint url are specified separatly. The bucket name is prepended to the MODO’s name in the same way as a local path, while the S3 endpoint url needs to be specified specifically.

Generate and modify a MODO at a remote object store#

A MODO can be generated from scratch or from file in the same way as locally, by specifying the remote endpoint’s url:

from modos.api import MODO
from pathlib import Path

# yaml file with MODO specifications
config_ex = Path("path/to/ex.yaml")

# Create a modo remotely
modo = build_modo_from_file(config_ex, "modos-demo/ex", s3_endpoint= "http://localhost/s3")
# Create a modo from file remotely
modos create -s3 "http://localhost/s3" ----from-file "path/to/ex.yaml" modos-demo/ex3

Note

Similar to MODO creation, any other modifying functionality of the modos-api, (e.g. modos add, modos remove or MODO.add_element(), MODO.remove_element())can be performed on remotely stored objects by specifying the S3 endpoint and bucket name as path.