Getting Started¶
This tutorial walks through the core workflows for the mava-exchange library:
writing a .mediapkg file from DataFrames, reading one back, validating it, and
inspecting it from the command line.
Installation¶
pip install mava-exchange
# or with uv:
uv add mava-exchange
Concepts¶
A .mediapkg file is a ZIP archive containing annotation data for one or more
videos. Each video has one or more tracks — Parquet files containing the
actual data.
There are two kinds of tracks:
ObservationSeries — a dense time-series of numeric values sampled at regular intervals. Each row is one point in time with one or more numeric dimensions. Use this for ML model outputs like emotion scores, audio volume, or any score sampled at a fixed rate.
AnnotationSeries — sparse interval annotations. Each row covers a time span (
start_seconds→end_seconds) with a string value. Use this for transcripts, shot boundaries, or any labeled segment.AnnotationListSeries — sparse interval annotations with multiple labels per segment. Each row covers a time span with a list of string values. Use this for multi-label classifications, keyword tags, or any annotation where multiple values apply simultaneously.
1. Writing a .mediapkg¶
1.1 Define your tracks¶
First describe what your data means using ObservationSeries or
AnnotationSeries. This is the semantic layer — it tells consumers what each
column measures.
from mava_exchange import ObservationSeries, AnnotationSeries, AnnotationListSeries, DimensionSpec
# A time-series track: one numeric value per dimension per timestep
emotion_track = ObservationSeries(
name="emotions",
description="Face emotion probability scores from DeepFace model",
sampling_interval=0.5, # seconds between samples
dimensions=[
DimensionSpec("angry", "Anger probability", "[0,1]"),
DimensionSpec("happy", "Happiness probability","[0,1]"),
DimensionSpec("neutral", "Neutral expression", "[0,1]"),
]
)
# An interval annotation track: start, end, and a string label per row
transcript_track = AnnotationSeries(
name="transcript",
description="Speech-to-text segments from Whisper",
)
# A multi-label annotation track: start, end, and a list of labels per row
scene_tags_track = AnnotationListSeries(
name="scene_tags",
description="Scene classification tags from Places3 model",
)
You can define any dimensions you need — the library is not tied to emotion scores. For example, a different tool might declare:
explosion_track = ObservationSeries(
name="explosion_detection",
description="Explosion probability from audio model, sampled every 0.1s",
sampling_interval=0.1,
dimensions=[
DimensionSpec("explosion", "Explosion probability", "[0,1]"),
]
)
1.2 Prepare your DataFrames¶
Each track expects a DataFrame with the columns declared in its definition.
For an ObservationSeries, the required columns are start_seconds plus one
column per dimension:
import pandas as pd
import numpy as np
n = 100
emotions_df = pd.DataFrame({
"start_seconds": np.arange(n) * 0.5,
"angry": np.random.uniform(0, 0.3, n),
"happy": np.random.uniform(0, 0.8, n),
"neutral": np.random.uniform(0, 0.5, n),
})
For an AnnotationSeries, the required columns are start_seconds,
end_seconds, and annotations:
transcript_df = pd.DataFrame({
"start_seconds": [0.0, 12.5, 30.1],
"end_seconds": [12.3, 29.8, 45.0],
"annotations": [
"Welcome to the conference.",
"Today we discuss video annotation.",
"Thank you for joining us.",
],
})
For an AnnotationListSeries, the required columns are start_seconds, end_seconds, and annotations — but annotations contains lists of strings:
scene_tags_df = pd.DataFrame({
"start_seconds": [0.0, 45.2, 78.5],
"end_seconds": [45.2, 78.5, 120.0],
"annotations": [
["outdoor", "natural"],
["indoor"],
["outdoor", "man-made"],
],
})
1.3 Write the package¶
Use MediaPackageWriter as a context manager. Call add_video() first, then
add_track() for each track. The file is written when the with block exits.
from mava_exchange import MediaPackageWriter
with MediaPackageWriter("corpus.mediapkg", description="My annotation corpus") as writer:
writer.add_video(
video_id="video_001",
src="https://example.org/videos/talk.mp4",
)
writer.add_track("video_001", emotion_track, emotions_df)
writer.add_track("video_001", transcript_track, transcript_df)
1.4 Multiple videos¶
Add as many videos as you need before the with block exits. Videos can have
different track sets — a track name shared across videos must have an identical
definition:
rms_track = ObservationSeries(
name="rms_volume",
description="RMS audio volume",
sampling_interval=0.064,
dimensions=[DimensionSpec("rms", "Root mean square audio volume", ">=0")]
)
rms_df = pd.DataFrame({
"start_seconds": np.arange(200) * 0.064,
"rms": np.abs(np.random.normal(0.1, 0.02, 200)),
})
with MediaPackageWriter("corpus.mediapkg", description="Two-video corpus") as writer:
# video_001: emotions + transcript
writer.add_video("video_001", "https://example.org/videos/talk_001.mp4")
writer.add_track("video_001", emotion_track, emotions_df)
writer.add_track("video_001", transcript_track, transcript_df)
# video_002: rms volume + transcript (different track set)
writer.add_video("video_002", "https://example.org/videos/talk_002.mp4")
writer.add_track("video_002", rms_track, rms_df)
writer.add_track("video_002", transcript_track, transcript_df)
2. Reading a .mediapkg¶
Use MediaPackageReader to read a package. Use it as a context manager to
ensure the file is closed properly.
from mava_exchange import MediaPackageReader
with MediaPackageReader("corpus.mediapkg") as reader:
# What's in this package?
print(reader.video_ids) # ["video_001", "video_002"]
print(reader.track_names) # ["emotions", "transcript", "rms_volume", "scene_tags"]
# Which tracks does a specific video have?
print(reader.tracks_for_video("video_001")) # ["emotions", "transcript"]
print(reader.tracks_for_video("video_002")) # ["rms_volume", "transcript"]
# Read a track into a DataFrame
df = reader.read_track("video_001", "emotions")
print(df.head())
# start_seconds angry happy neutral
# 0 0.0 0.12451 0.64231 0.23318
# 1 0.5 0.08734 0.71204 0.20062
# Read all tracks for a video at once
tracks = reader.read_video("video_001")
# tracks == {"emotions": df, "transcript": df}
# Get track definition (reconstructed as a typed object)
track = reader.track_def("emotions")
print(track.sampling_interval) # 0.5
print([d.name for d in track.dimensions]) # ["angry", "happy", "neutral"]
# Get video metadata
meta = reader.video_meta("video_001")
print(meta["src"]) # "https://example.org/videos/talk_001.mp4"
Quick file stats without loading data¶
with MediaPackageReader("corpus.mediapkg") as reader:
for stat in reader.file_stats():
ratio = (1 - stat["compressed_bytes"] / stat["size_bytes"]) * 100
print(f"{stat['path']:<40} {stat['rows']:>6} rows {ratio:.0f}% compressed")
3. Validating a .mediapkg¶
From Python¶
from mava_exchange.validate import validate_mediapkg
result = validate_mediapkg("corpus.mediapkg")
if result.valid:
print("Package is valid.")
else:
print(result.summary())
The validator checks:
manifest structure and required fields
every file referenced in the manifest exists in the archive
every referenced track is defined
start_secondsis non-null, non-negative, and orderedend_seconds > start_secondsfor allAnnotationSeriesrowsdimension columns are numeric and non-null for
ObservationSeries
Pass strict=True to also warn about recommended but optional fields:
result = validate_mediapkg("corpus.mediapkg", strict=True)
print(result.summary())
From the command line¶
mediapkg-validate corpus.mediapkg
mediapkg-validate corpus.mediapkg --strict
Exit code is 0 for valid and 1 for invalid — works in CI pipelines:
mediapkg-validate corpus.mediapkg || exit 1
4. Inspecting from the CLI¶
The mediapkg-inspect command gives a human-readable summary without writing
any code.
Corpus overview:
mediapkg-inspect corpus.mediapkg
════════════════════════════════════════════════════════════
corpus.mediapkg
════════════════════════════════════════════════════════════
Version: 0.1
Created: 2025-08-12T10:00:00+00:00
Ontology: http://example.org/mava/ontology#
Description: Two-video corpus
Videos: 2
Tracks:
emotions mava:ObservationSeries @0.5s [angry, happy, neutral]
transcript mava:AnnotationSeries
rms_volume mava:ObservationSeries @0.064s [rms]
Videos:
video_001
src: https://example.org/videos/talk_001.mp4
tracks: emotions, transcript
video_002
src: https://example.org/videos/talk_002.mp4
tracks: rms_volume, transcript
Files:
Path Rows Raw Compressed Saved
-------------------------------------------- ------ ------ ---------- -----
video_001/emotions.parquet 100 8.2KB 3.1KB 62%
video_001/transcript.parquet 3 2.1KB 1.4KB 33%
video_002/rms_volume.parquet 200 6.4KB 2.8KB 56%
video_002/transcript.parquet 3 2.1KB 1.4KB 33%
Drill into a specific track:
mediapkg-inspect corpus.mediapkg --track emotions --video video_001 --head 3
Track: emotions (mava:ObservationSeries)
Video: video_001
Desc: Face emotion probability scores from DeepFace model
Rows: 100
Columns:
start_seconds double[pyarrow]
angry double[pyarrow]
happy double[pyarrow]
neutral double[pyarrow]
First 3 rows:
start_seconds angry happy neutral
0.0 0.12451 0.64231 0.23318
0.5 0.08734 0.71204 0.20062
1.0 0.21003 0.55891 0.23106
Dimensions:
angry Anger probability [0,1]
happy Happiness probability [0,1]
neutral Neutral expression [0,1]
5. The .mediapkg format at a glance¶
A .mediapkg is a ZIP archive. You can always unzip it manually to inspect:
unzip -l corpus.mediapkg
# or
unzip corpus.mediapkg -d corpus_contents/
cat corpus_contents/manifest.json
The manifest.json is human-readable JSON containing all metadata, the JSON-LD
context mapping column names to the MAVA ontology, and the file inventory. See
spec/SPEC.md for the full format specification.
Next steps¶
See
examples/tsv_to_mediapkg.pyfor a complete example converting real TSV annotation files from two different tools into a corpus package.See
spec/SPEC.mdfor the full format specification.See
spec/mava.ttlfor the MAVA ontology and SHACL validation shapes.