Validation

Check if .mediapkg files conform to the MAVA specification.

Basic Usage

from mava_exchange.validate import validate_mediapkg

result = validate_mediapkg("corpus.mediapkg")

if result.valid:
    print("✓ Package is valid")
else:
    print(result.summary())

What Gets Validated

The validator checks:

Package Structure

  • ZIP archive is readable

  • manifest.json exists at root

  • All referenced Parquet files exist

Manifest Contents

  • Required fields present (version, created, ontology, tracks, videos)

  • Track definitions are complete

  • Video IDs are unique

  • File paths match actual files in archive

Parquet Data

  • Columns match track definitions

  • start_seconds is non-negative and sorted

  • For annotations: end_seconds > start_seconds

  • For observations: numeric dimensions with no nulls

  • For list annotations: values are actually lists of strings

Common Validation Tasks

Check if package is valid

from mava_exchange.validate import validate_mediapkg

result = validate_mediapkg("pkg.mediapkg")

if result.valid:
    print(f"✓ Package passed {result.checks} checks")
else:
    print(f"✗ Found {len(result.errors)} errors")
    for error in result.errors:
        print(error)

Use strict mode

Strict mode warns about recommended but optional fields:

result = validate_mediapkg("pkg.mediapkg", strict=True)

# Shows warnings for:
# - Missing track descriptions
# - Missing sampling_interval for ObservationSeries
# - Unexpected extra columns

print(result.summary())

Command-line validation

# Basic validation
mediapkg-validate corpus.mediapkg

# Strict mode
mediapkg-validate corpus.mediapkg --strict

# Exit code: 0 if valid, 1 if invalid
mediapkg-validate corpus.mediapkg && echo "Valid!"

Validation Output

Valid package

═══════════════════════════════════════════════
  Validating: corpus.mediapkg
═══════════════════════════════════════════════

Manifest:
  Top-level fields...
  Tracks...
  Videos...

Parquet files:
  video_001/emotions.parquet...
  video_001/transcript.parquet...

✓ VALID  —  45 checks, 0 errors, 0 warnings

Invalid package

═══════════════════════════════════════════════
  Validating: bad.mediapkg
═══════════════════════════════════════════════

Errors:
  ✗ video_001/emotions.parquet: start_seconds has negative values
  ✗ video_002/transcript.parquet: 3 row(s) where end_seconds <= start_seconds

✗ INVALID  —  42 checks, 2 errors, 0 warnings

Common Errors

Error

Cause

Fix

start_seconds has negative values

Time values < 0

Ensure all timestamps ≥ 0

rows not ordered by start_seconds

Data not sorted

Sort DataFrame before writing

end_seconds <= start_seconds

Invalid intervals

Check interval logic

missing columns

DataFrame missing required columns

Add all columns from track definition

annotations column must contain lists

Used strings instead of lists

Use AnnotationSeries for single labels

Function and Class References

For complete function and class documentation, see: