Media Stack
Shared Media Runtime (core-x/media_runtime/)
A Python package providing unified interfaces to media services. Imported by both Narrative MCP and Stories MCP.
| Module | Wraps | Purpose |
|---|---|---|
TtsRuntime | mlx-audio (:8093), Voicebox (fallback) | Text-to-speech synthesis with voice cloning |
FrameRuntime | mflux (:8083) | Still image / styleframe generation |
VideoRuntime | video (:8084) / LTX-2 | Motion / video sequence generation |
VoiceResolver | avatar-labs manifests, model-zoo voice samples, data-pool | Character code → voice profile mapping |
MediaJobStore | filesystem | Persistent async job tracking (queued/running/done/error) |
Media Pipeline Split
The media stack is intentionally split by concern:
| Concern | Engine | Service | Role |
|---|---|---|---|
| Still frames / styleframes | mflux (Flux Schnell/Dev/Kontext) | :8083 | Generate first/mid/last anchor frames |
| Motion / interpolation | LTX-2 MLX | :8084 | Generate video from anchor frames or text prompts |
| Speech / audio | mlx-audio (Kokoro TTS) | :8093 | TTS synthesis, default runtime |
| Voice cloning | Qwen (via mlx-audio) | :8093 | Default cloning path for character voices |
| Voice identity | avatar-labs registry | — | Primary source for voice profiles and character mapping |
| Supplemental audio | data-pool | — | Reference audio files for cloning |
Video Presets (core-x/config/presets/video-ltx2.json)
Three LTX-2 presets for different quality/speed tradeoffs:
| Preset | Still Anchors | Pipeline | Resolution | Steps | Use Case |
|---|---|---|---|---|---|
ltx_fast_distilled | first, last | distilled | 384x256 | 8 | Fast preview iteration |
ltx_keyframe_interp | first, mid, last | keyframe_interpolation | 384x256 | 8 | Explicit anchor-driven motion |
ltx_quality_twostage | first, mid, last | two_stage | 704x480 | 15 | Higher quality final render |
Draw Things Parity (Research Only)
Draw Things added LTX-2 video generation (v1.20260303.0). Core-X matches the workflow shape but does not depend on Draw Things at runtime. The mapping is documented in core-x/docs/research/2026-03-06-drawthings-ltx2-parity.md. Key difference: Core-X keeps still frames, motion, and audio as separate lanes for independent retries and provenance.