mava_exchange.reader.MediaPackageReader¶
- class mava_exchange.reader.MediaPackageReader(path: str | Path)¶
Read .mediapkg archive files.
Use as a context manager or call open()/close() manually.
Example:
with MediaPackageReader("corpus.mediapkg") as r: print(r.video_ids) print(r.track_names) df = r.read_track("v001", "emotions")
- __init__(path: str | Path)¶
Initialize reader.
- Parameters:
path (str or Path) – Path to .mediapkg file
Methods
__init__(path)Initialize reader.
close()Close the package file.
export_manifest_as_rdf([format, base_uri])Export manifest as RDF.
Get size and row count for each Parquet file.
open()Open the package for reading.
read_track(video_id, track_name)Read a track's data into a DataFrame.
read_video(video_id)Read all tracks for a video.
track_def(track_name)Get track definition object.
tracks_for_video(video_id)List track names available for a video.
video_meta(video_id)Get video metadata.
Attributes
Package description.
The parsed manifest.json dictionary.
Ontology URI.
List of all track names across all videos.
Format version from manifest.
List of video IDs in the package.
- open() Self¶
Open the package for reading.
- close()¶
Close the package file.
- property manifest: dict¶
The parsed manifest.json dictionary.
- property version: str¶
Format version from manifest.
- property description: str¶
Package description.
- property ontology: str¶
Ontology URI.
- property video_ids: list[str]¶
List of video IDs in the package.
- property track_names: list[str]¶
List of all track names across all videos.
- video_meta(video_id: str) dict¶
Get video metadata.
Returns src, title, duration etc. (excludes file paths).
- track_def(track_name: str) ObservationSeries | AnnotationSeries | AnnotationListSeries¶
Get track definition object.
- Returns:
ObservationSeries, AnnotationSeries, or AnnotationListSeries
- Return type:
Track
- tracks_for_video(video_id: str) list[str]¶
List track names available for a video.
- read_track(video_id: str, track_name: str) DataFrame¶
Read a track’s data into a DataFrame.
- Parameters:
video_id (str) – Video identifier
track_name (str) – Track name
- Returns:
Track data with columns matching the track definition
- Return type:
pd.DataFrame
- read_video(video_id: str) dict[str, DataFrame]¶
Read all tracks for a video.
- Returns:
Mapping of track_name → DataFrame
- Return type:
dict[str, pd.DataFrame]
- file_stats() list[dict]¶
Get size and row count for each Parquet file.
Reads metadata only, does not load data.
- Returns:
List of {path, rows, size_bytes, compressed_bytes}
- Return type:
list[dict]
- export_manifest_as_rdf(format: str = 'turtle', base_uri: str = 'http://example.org/data/') str¶
Export manifest as RDF.
Exports package structure only, not row data.
- Parameters:
format (str) – “turtle” or “json-ld”
base_uri (str) – Base URI for generated identifiers
- Returns:
RDF serialization
- Return type:
str