Validation¶
Check if .mediapkg files conform to the MAVA specification.
Basic Usage¶
from mava_exchange.validate import validate_mediapkg
result = validate_mediapkg("corpus.mediapkg")
if result.valid:
print("✓ Package is valid")
else:
print(result.summary())
What Gets Validated¶
The validator checks:
Package Structure¶
ZIP archive is readable
manifest.jsonexists at rootAll referenced Parquet files exist
Manifest Contents¶
Required fields present (version, created, ontology, tracks, videos)
Track definitions are complete
Video IDs are unique
File paths match actual files in archive
Parquet Data¶
Columns match track definitions
start_secondsis non-negative and sortedFor annotations:
end_seconds > start_secondsFor observations: numeric dimensions with no nulls
For list annotations: values are actually lists of strings
Common Validation Tasks¶
Check if package is valid¶
from mava_exchange.validate import validate_mediapkg
result = validate_mediapkg("pkg.mediapkg")
if result.valid:
print(f"✓ Package passed {result.checks} checks")
else:
print(f"✗ Found {len(result.errors)} errors")
for error in result.errors:
print(error)
Use strict mode¶
Strict mode warns about recommended but optional fields:
result = validate_mediapkg("pkg.mediapkg", strict=True)
# Shows warnings for:
# - Missing track descriptions
# - Missing sampling_interval for ObservationSeries
# - Unexpected extra columns
print(result.summary())
Command-line validation¶
# Basic validation
mediapkg-validate corpus.mediapkg
# Strict mode
mediapkg-validate corpus.mediapkg --strict
# Exit code: 0 if valid, 1 if invalid
mediapkg-validate corpus.mediapkg && echo "Valid!"
Validation Output¶
Valid package¶
═══════════════════════════════════════════════
Validating: corpus.mediapkg
═══════════════════════════════════════════════
Manifest:
Top-level fields...
Tracks...
Videos...
Parquet files:
video_001/emotions.parquet...
video_001/transcript.parquet...
✓ VALID — 45 checks, 0 errors, 0 warnings
Invalid package¶
═══════════════════════════════════════════════
Validating: bad.mediapkg
═══════════════════════════════════════════════
Errors:
✗ video_001/emotions.parquet: start_seconds has negative values
✗ video_002/transcript.parquet: 3 row(s) where end_seconds <= start_seconds
✗ INVALID — 42 checks, 2 errors, 0 warnings
Common Errors¶
Error |
Cause |
Fix |
|---|---|---|
|
Time values < 0 |
Ensure all timestamps ≥ 0 |
|
Data not sorted |
Sort DataFrame before writing |
|
Invalid intervals |
Check interval logic |
|
DataFrame missing required columns |
Add all columns from track definition |
|
Used strings instead of lists |
Use AnnotationSeries for single labels |
Function and Class References¶
For complete function and class documentation, see:
validate_mediapkg() - Validation function
ValidationResult - Result object