Protocols
The canonical LLM protocol for Core-X is OpenResponses (POST /v1/responses) — not /v1/chat/completions.
Why OpenResponses?
Every LLM-capable service speaks POST /v1/responses with SSE streaming. The legacy /v1/chat/completions endpoint has been removed from the codebase.
Key Properties
- Streaming: Server-Sent Events (SSE) for real-time token delivery
- Unified: Same protocol for gateway, mlx-llm, and mlx-rag
- A2A: Agent-to-Agent communication uses the event bus (
:8085) over SSE
Request Flow
Client → POST /v1/responses → Gateway (:8090) → mlx-llm (:8091) → mlx-rag (:8092) → [other services] ← SSE stream ←Event Bus (A2A)
The event bus at :8085 provides SSE pub/sub for agent-to-agent communication. Services publish events and subscribe to channels for real-time coordination.
# Health checkcurl http://localhost:8085/health
# Start standalonepython core-x/scripts/run_event_bus.py --host 127.0.0.1 --port 8085