Application Architecture

The Political Debates application acts as a comprehensive wrapper and orchestration layer around advanced AI components. Its primary goal is to transform raw media into searchable, speaker-aware knowledge.

Overview

Core Concept: An orchestration wrapper around Hugging Face AI components.
Primary Output: "Speaker Statements"—identifying exactly who said what and when.
Human-in-the-Loop: A dedicated interface for Editors to add missing metadata and correct AI inaccuracies.
Search & Discovery: A Solr-backed engine to filter statements by speaker, topic, or keyword.

System Context Diagram

This diagram illustrates how the application "wraps" the raw AI processing to add value through storage, search, and human editing.

graph TD
    %% Nodes
    subgraph Wrapper ["Application Wrapper"]
        Logic[<b>Orchestration Logic</b><br>Video Conversion & API Management]
        UI[<b>User Interface</b><br>Search & Media Player]
    end

    subgraph AI ["AI Component (Hugging Face)"]
        Whisper[<b>WhisperX</b><br>Diarization & Transcription]
    end

    subgraph Human ["Human-in-the-Loop"]
        Editor[<b>Editor</b><br>Metadata & Corrections]
    end

    subgraph Data ["Search Index"]
        Solr[<b>Solr</b><br>Statement Search]
    end

    %% Relationships
    Input([Raw Video]) --> Logic
    Logic -->|1. Extract Audio| Whisper
    Whisper -->|2. Diarized Text| Logic
    Logic -->|3. Index Statements| Solr

    %% Human Interaction
    Editor -->|4. Add Metadata / Correct Text| UI
    UI -->|Updates| Solr

    %% Styling
    classDef wrap fill:#e3f2fd,stroke:#1565c0,stroke-width:2px;
    classDef ai fill:#f3e5f5,stroke:#4a148c,stroke-width:2px;
    classDef human fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,stroke-dasharray: 5 5;

    class Logic,UI wrap;
    class Whisper ai;
    class Editor human;

Core Functionality

1. The Wrapper Concept

The application does not perform the heavy machine learning itself. Instead, it serves as a bridge (wrapper) to Hugging Face:

It handles video input by first converting it to audio.
It sends this audio to the AI component (WhisperX).
It retrieves and stores the results.

2. Speaker Statements

The main value proposition of the system is the derivation of Speaker Statements. Unlike simple subtitles, the analysis breaks content down by speaker. This structure is what enables the central query: "Who said what, and when?"

3. Human-in-the-Loop

While the AI provides the text, it cannot know the context (e.g., the specific name of a politician or the date of the debate) derived purely from audio.

Metadata Injection: Editors manually provide context that the audio lacks.
Correction: Editors use the interface to fix transcription errors or misattributed speakers.

Future Roadmap

In later versions, this manual step could be further automated. For example, Face Recognition could identify speakers visually, or scrapers could pull metadata directly from official debate websites. User Interface Components

User Interface Components

To make this data accessible, the application provides two key views:

Search Page: Powered by Solr, this allows users to filter statements by metadata tags (e.g., Topic, Date) or search for specific text phrases.
Media Player: A specialized tool for playback and editing. It allows the Editor to watch the video while simultaneously correcting the transcript and metadata in real-time.

Techstack

For the exact techstack see Installations: Techstack.

Components and Design Principles

Application: The Application design follows an API first pattern, where the Frontend is dumb and the API is the interface to processing.
Datastorage: A dual-layer storage system separates long-term storage from a short term storage that can be quickly adapted to application needs.
Whisper: An external AI on hugging face is accessed via API and does the heavy lifting to extract speaker statements, languages and translations form the media inputs.
Processing Pipeline: Processing is guided by a unique media_id that allows to track processing, data and metadata in the various components of the system
Presigned Urls: The frontend has no credentials for the primary datastorage on S3 and uses presigned urls to upload, stream and download from there.