Working with remote objects#
Remote storage can be key to share and collaborate on multiomics data. MODOS
integrates with S3 object storage and htsget to allow remote storage, access and real-time secure streaming of genomic data.
Most of the MODOS-api
’s functionalities work with remotely stored objects in the same way as with local objects. The user only as to specify the s3_endpoint
of the remote object store.
List remotely available MODO’s#
Listing all available MODOs
at a specific S3 endpoint (in this tutorial we will use http://localhost as example) will show MODOs
in all buckets at that endpoint:
import modos.remote as remo
# Show all remote modos
remo.list_remote_items("http://localhost")
# ['modos-demo/GIAB', 'modos-demo/ex']
Show metadata of a remote MODO#
For all or a specific MODO
metadata can directly be displayed:
import modos.remote as remo
# Get metadata of all MODOs at endpoint "http://localhost"
remo.get_metadata_from_remote("http://localhost")
# Get metadata of MODO with id ex
remo.get_metadata_from_remote("http://localhost", modo_id="ex")
Find a specific MODO and get its S3 path#
There are different options to query a specific MODO
and the bucket name to load it from - fuzzy search or exact string matching:
import modos.remote as remo
# Query all MODOs with sequence similar to "ex"
remo.get_s3_path("http://localhost", query="ex")
# [{'http://localhost/s3/modos-demo/ex': {'s3_endpoint': 'http://localhost/s3', 'modo_path': 'modos-demo/ex'}}]
# Query all MODOs exactly matching "ex"
remo.get_s3_path("http://localhost", query="ex", exact_match=True)
# []
Instantiate a remote MODO locally#
Remotely stored MODOs
can be instantiated by specifying their remote endpoint and then worked with as if they were stored locally.
The example below assumes a public s3 bucket endpoint accessible anonymously (without credentials).
from modos.api import MODO
# Load MODO from remote storage
modo=MODO(path='s3://modos-demo/ex', endpoint='http://localhost', s3_kwargs={"anon": True})
# All operations can be applied as if locally
modo.metadata
# {'ex': {'@type': 'MODO', 'creation_date': '2024-02-19T00:00:00', 'description': 'Dummy modo for tests.', 'has_assay': ..}}
# Interact with remotely stored MODO
modos --anon --endpoint http://localhost show s3://modos-demo/ex
# ex:
# '@type': MODO
# creation_date: '2024-02-19T00:00:00'
# description: Dummy modo for tests.
# has_assay:
Warning
The bucket name and the endpoint url are specified separately. The bucket name is part of the object_path
and needs to be included in the s3 path, followed by the MODO
’s name (e.g. s3://bucket_name/modo_name
), while the endpoint url needs to be specified separately. Only paths that follow the s3 scheme will be considered as remote independent of --endpoint
being specified or not.
Note
To avoid repetition, the endpoint and anon values can also be read from environment variables.
The syntax then follows the same as for local objects, except that the object_path
needs to be provided as s3 scheme:
export MODOS_ENDPOINT='http://localhost'
export MODOS_ANON=true
modos create s3://bucket/object1
modos show s3://bucket/object1
modos delete s3://bucket/object1
Use authenticated buckets#
Most use-cases require authentication to access the S3 bucket. This usually requires an access key and secret key. MODOS can access these keys through the standard AWS environment variables:
export AWS_ACCESS_KEY_ID=<id>
export AWS_SECRET_ACCESS_KEY=<secret>
export MODOS_ENDPOINT='http://modos.example.org'
modos show s3://protected-bucket/example
However, it is strongly recommended to avoid entering secrets in the terminal and instead store them in encrypted .env files. Tools like sops make this easy:
# create public/secret key pair
age-keygen -o keypair.txt
# create encrypted env file
sops --age <public-key> .enc.env
# Values decrypted in memory and injected in the modos process
sops exec-env .enc.env 'modos show s3://protected-bucket/example'
SOPS_AGE_KEY_FILE=keypair.txt sops exec-env .enc.env \
'modos show s3://protected-bucket/example'
Generate and modify a MODO at a remote object store#
A MODO
can be generated from scratch or from file in the same way as locally, by specifying the remote endpoint’s url or MODOS_ENDPOINT
:
from modos.api import MODO
from pathlib import Path
# yaml file with MODO specifications
config_ex = Path("path/to/ex.yaml")
# Create a modo remotely
modo = MODO.from_file(config_ex, "s3://modos-demo/ex", endpoint="http://localhost")
# Create a modo from file remotely
modos --endpoint "http://localhost" create --from-file "path/to/ex.yaml" s3://modos-demo/ex3
Note
Similar to MODO
creation, any other modifying functionality of the modos-api
, (e.g. modos add
, modos remove
or MODO.add_element()
, MODO.remove_element()
) can be performed on remotely stored objects by specifying the endpoint and object path as s3 scheme + bucket name as path.
Download and upload a MODO#
A MODO
can directly be downloaded from a remote endpoint.
from modos.api import MODO
# Load MODO from remote storage
modo=MODO(path='s3://modos-demo/ex', endpoint='http://localhost')
# Download MODO to local path "data/ex"
modo.download("data/ex")
# Download a remote modo from "modos-demo/ex" to local path "data/ex"
modos --endpoint http://localhost remote download --target data/ex s3://modos-demo/ex
A local MODO
can be uploaded to a remote endpoint.
from modos.api import MODO
# Load MODO from local storage
modo=MODO(path='data/ex')
# Upload MODO to remote path "modos-demo/ex"
modo.upload("s3://modos-demo/ex", s3_endpoint='http://localhost')
# Upload a local modo from "data/ex" to remote path "modos-demo/ex"
modos --endpoint http://localhost remote upload --target s3://modos-demo/ex data/ex