Working with remote objects#
Remote storage can be key to share and collaborate on multiomics data. MODOS
integrates with S3 object storage and htsget to allow remote storage, access and real-time secure streaming of genomic data.
Most of the MODOS-api
’s functionalities work with remotely stored objects in the same way as with local objects. The user only as to specify the s3_endpoint
of the remote object store.
List remotely available MODO’s#
Listing all available MODOs
at a specific S3 endpoint (in this tutorial we will use http://localhost as example) will show MODOs
in all buckets at that endpoint:
import modos.remote as remo
# Show all remote modos
remo.list_remote_items("http://localhost")
# ['modos-demo/GIAB', 'modos-demo/ex']
Show metadata of a remote MODO#
For all or a specific MODO
metadata can directly be displayed:
import modos.remote as remo
# Get metadata of all MODOs at endpoint "http://localhost"
remo.get_metadata_from_remote("http://localhost")
# Get metadata of MODO with id ex
remo.get_metadata_from_remote("http://localhost", modo_id = "ex")
Find a specific MODO and get it’s S3 path#
There are different options to query a specific MODO
and the bucket name to load it from - fuzzy search or exact string matching:
import modos.remote as remo
# Query all MODOs with sequence similar to "ex"
remo.get_s3_path("http://localhost", query="ex")
# [{'http://localhost/s3/modos-demo/ex': {'s3_endpoint': 'http://localhost/s3', 'modo_path': 'modos-demo/ex'}}]
# Query all MODOs exactly matching "ex"
remo.get_s3_path("http://localhost", query="ex", exact_match = True)
# []
Intiantiate a remote MODO locally#
Remotely stored MODOs
can be intiantiated by specifiying their remote endpoint and then and worked with as if they were stored locally.
from modos.api import MODO
# Load MODO from remote storage
modo=MODO(path= 'modos-demo/ex', s3_endpoint = 'http://localhost/s3')
# All operations can be applied as if locally
modo.metadata
# {'ex': {'@type': 'MODO', 'creation_date': '2024-02-19T00:00:00', 'description': 'Dummy modo for tests.', 'has_assay': ..}}
# Interact with remotly stored MODO
modos show -s3 "http://localhost/s3" modos-demo/ex
# ex:
# '@type': MODO
# creation_date: '2024-02-19T00:00:00'
# description: Dummy modo for tests.
# has_assay:
Note
The bucket name and the S3 endpoint url are specified separatly. The bucket name is prepended to the MODO
’s name in the same way as a local path, while the S3 endpoint url needs to be specified specifically.
Generate and modify a MODO at a remote object store#
A MODO
can be generated from scratch or from file in the same way as locally, by specifying the remote endpoint’s url:
from modos.api import MODO
from pathlib import Path
# yaml file with MODO specifications
config_ex = Path("path/to/ex.yaml")
# Create a modo remotely
modo = build_modo_from_file(config_ex, "modos-demo/ex", s3_endpoint= "http://localhost/s3")
# Create a modo from file remotely
modos create -s3 "http://localhost/s3" ----from-file "path/to/ex.yaml" modos-demo/ex3
Note
Similar to MODO
creation, any other modifying functionality of the modos-api
, (e.g. modos add
, modos remove
or MODO.add_element()
, MODO.remove_element()
)can be performed on remotely stored objects by specifying the S3 endpoint and bucket name as path.