mava_exchange.reader.MediaPackageReader

class mava_exchange.reader.MediaPackageReader(path: str | Path)

Read .mediapkg archive files.

Use as a context manager or call open()/close() manually.

Example:

with MediaPackageReader("corpus.mediapkg") as r:
    print(r.video_ids)
    print(r.track_names)
    df = r.read_track("v001", "emotions")
__init__(path: str | Path)

Initialize reader.

Parameters:

path (str or Path) – Path to .mediapkg file

Methods

__init__(path)

Initialize reader.

close()

Close the package file.

export_manifest_as_rdf([format, base_uri])

Export manifest as RDF.

file_stats()

Get size and row count for each Parquet file.

open()

Open the package for reading.

read_track(video_id, track_name)

Read a track's data into a DataFrame.

read_video(video_id)

Read all tracks for a video.

track_def(track_name)

Get track definition object.

tracks_for_video(video_id)

List track names available for a video.

video_meta(video_id)

Get video metadata.

Attributes

description

Package description.

manifest

The parsed manifest.json dictionary.

ontology

Ontology URI.

track_names

List of all track names across all videos.

version

Format version from manifest.

video_ids

List of video IDs in the package.

open() Self

Open the package for reading.

close()

Close the package file.

property manifest: dict

The parsed manifest.json dictionary.

property version: str

Format version from manifest.

property description: str

Package description.

property ontology: str

Ontology URI.

property video_ids: list[str]

List of video IDs in the package.

property track_names: list[str]

List of all track names across all videos.

video_meta(video_id: str) dict

Get video metadata.

Returns src, title, duration etc. (excludes file paths).

track_def(track_name: str) ObservationSeries | AnnotationSeries | AnnotationListSeries

Get track definition object.

Returns:

ObservationSeries, AnnotationSeries, or AnnotationListSeries

Return type:

Track

tracks_for_video(video_id: str) list[str]

List track names available for a video.

read_track(video_id: str, track_name: str) DataFrame

Read a track’s data into a DataFrame.

Parameters:
  • video_id (str) – Video identifier

  • track_name (str) – Track name

Returns:

Track data with columns matching the track definition

Return type:

pd.DataFrame

read_video(video_id: str) dict[str, DataFrame]

Read all tracks for a video.

Returns:

Mapping of track_name → DataFrame

Return type:

dict[str, pd.DataFrame]

file_stats() list[dict]

Get size and row count for each Parquet file.

Reads metadata only, does not load data.

Returns:

List of {path, rows, size_bytes, compressed_bytes}

Return type:

list[dict]

export_manifest_as_rdf(format: str = 'turtle', base_uri: str = 'http://example.org/data/') str

Export manifest as RDF.

Exports package structure only, not row data.

Parameters:
  • format (str) – “turtle” or “json-ld”

  • base_uri (str) – Base URI for generated identifiers

Returns:

RDF serialization

Return type:

str