Skip to content

Media Stack

Shared Media Runtime (core-x/media_runtime/)

A Python package providing unified interfaces to media services. Imported by both Narrative MCP and Stories MCP.

ModuleWrapsPurpose
TtsRuntimemlx-audio (:8093), Voicebox (fallback)Text-to-speech synthesis with voice cloning
FrameRuntimemflux (:8083)Still image / styleframe generation
VideoRuntimevideo (:8084) / LTX-2Motion / video sequence generation
VoiceResolveravatar-labs manifests, model-zoo voice samples, data-poolCharacter code → voice profile mapping
MediaJobStorefilesystemPersistent async job tracking (queued/running/done/error)

Media Pipeline Split

The media stack is intentionally split by concern:

ConcernEngineServiceRole
Still frames / styleframesmflux (Flux Schnell/Dev/Kontext):8083Generate first/mid/last anchor frames
Motion / interpolationLTX-2 MLX:8084Generate video from anchor frames or text prompts
Speech / audiomlx-audio (Kokoro TTS):8093TTS synthesis, default runtime
Voice cloningQwen (via mlx-audio):8093Default cloning path for character voices
Voice identityavatar-labs registryPrimary source for voice profiles and character mapping
Supplemental audiodata-poolReference audio files for cloning

Video Presets (core-x/config/presets/video-ltx2.json)

Three LTX-2 presets for different quality/speed tradeoffs:

PresetStill AnchorsPipelineResolutionStepsUse Case
ltx_fast_distilledfirst, lastdistilled384x2568Fast preview iteration
ltx_keyframe_interpfirst, mid, lastkeyframe_interpolation384x2568Explicit anchor-driven motion
ltx_quality_twostagefirst, mid, lasttwo_stage704x48015Higher quality final render

Draw Things Parity (Research Only)

Draw Things added LTX-2 video generation (v1.20260303.0). Core-X matches the workflow shape but does not depend on Draw Things at runtime. The mapping is documented in core-x/docs/research/2026-03-06-drawthings-ltx2-parity.md. Key difference: Core-X keeps still frames, motion, and audio as separate lanes for independent retries and provenance.