Skip to content

Data & Pipelines

Pipeline Orchestrator

The master pipeline runner executes 5 phases in dependency order:

Terminal window
# Run all pipelines (requires gateway + RAG online)
python core-x/pipelines/run_all.py
# Run only local pipelines (skip RAG-dependent steps)
python core-x/pipelines/run_all.py --local

Pipeline Phases

PhasePipelineDepends OnPurpose
1sync_service_stateSync service health snapshots
2event_processorProcess queued A2A events
3knowledge_builderBuild knowledge base, RAG ingest
4ingest_user_memoryRAG onlineIndex user memory into LanceDB
5ingest_refineryRAG + phases 1,3Ingest curated refinery content

Data Flow

data-pool/raw/ → ffmpeg extract → audio, frames
→ Whisper STT → transcripts
→ Qwen Vision → scene descriptions
→ knowledge_builder → LanceDB embeddings
refinery/ → ingest_refinery → RAG-searchable knowledge

Other Pipeline Scripts

Terminal window
# Validate model zoo registry
python model-zoo/scripts/validate_registry.py
# Regenerate registries (flows, services, houses)
python scripts/generate-registries.py
# Validate ecosystem integrity
python core-x/scripts/validate-ecosystem.py
# Execute a single flow
python core-x/scripts/run_flow.py <flow-name>
# Start event bus standalone
python core-x/scripts/run_event_bus.py --host 127.0.0.1 --port 8085