Data & Pipelines
Pipeline Orchestrator
The master pipeline runner executes 5 phases in dependency order:
# Run all pipelines (requires gateway + RAG online)python core-x/pipelines/run_all.py
# Run only local pipelines (skip RAG-dependent steps)python core-x/pipelines/run_all.py --localPipeline Phases
| Phase | Pipeline | Depends On | Purpose |
|---|---|---|---|
| 1 | sync_service_state | — | Sync service health snapshots |
| 2 | event_processor | — | Process queued A2A events |
| 3 | knowledge_builder | — | Build knowledge base, RAG ingest |
| 4 | ingest_user_memory | RAG online | Index user memory into LanceDB |
| 5 | ingest_refinery | RAG + phases 1,3 | Ingest curated refinery content |
Data Flow
data-pool/raw/ → ffmpeg extract → audio, frames → Whisper STT → transcripts → Qwen Vision → scene descriptions → knowledge_builder → LanceDB embeddingsrefinery/ → ingest_refinery → RAG-searchable knowledgeOther Pipeline Scripts
# Validate model zoo registrypython model-zoo/scripts/validate_registry.py
# Regenerate registries (flows, services, houses)python scripts/generate-registries.py
# Validate ecosystem integritypython core-x/scripts/validate-ecosystem.py
# Execute a single flowpython core-x/scripts/run_flow.py <flow-name>
# Start event bus standalonepython core-x/scripts/run_event_bus.py --host 127.0.0.1 --port 8085