
Disclaimer: This series is a personal, educational reference architecture. All diagrams, opinions, and frameworks are my own and are not affiliated with, sponsored by, or representative of my employer. I’m publishing it on my own time and without using any confidential information.
© 2026 Sean Miller. All rights reserved.
Before we talk about the AI Gateway within the Runtime Plane, we must first define the Experience Plane. The Experience Plane is the user-facing layer of the architecture that directly interacts with the AI Gateway. For a chatbot, it might be a standalone web or mobile app, or a chat window embedded into an existing product. For an automated insights generation tool, it might be a workflow run on a cron job that schedules data refresh.
The interaction surface between the Experience Plane and the Runtime Plane is the AI Gateway: the front door to any enterprise AI platform. When you make an API call, say POST /v1/ai/turn, the Experience Plane passes along a pre-defined set of parameters like attachments, context, model, and prompt that the API contract expects. The AI Gateway then routes the request to the Orchestration Engine, which manages workflow execution.
Think of the AI Gateway like an API gateway, but with workflow routing, session state, AI-specific governance, and standardized output baked in. It is more than a simple, opinionated, and unified API surface. It is responsible for the enforcement of:
Figure 1: AI Gateway Component Diagram
The AI Gateway is responsible for both AuthN (authentication) and AuthZ (authorization). It handles interaction with the identity provider, as well as the policy store, both of which are managed in the control plane.
In practice, a stripped down AuthN request looks like passing the user’s JWT token to the identity provider, which returns a set of claims; if the user is not authenticated, the request is rejected and the Experience Plane is responsible for handling the error in a user-friendly way.
The AuthZ component looks for the user’s entitlements, i.e. whether they are allowed to perform the requested action based on the policy store, and returns a boolean decision and reason code. The AuthZ responsibility is also deeply tied to request validation, rate limiting, and abuse protection.
The AI Gateway is on the front lines for session management and context budgeting, though it is not building the session state and context itself. The session state itself is compiled by the Orchestration Engine during workflow execution, but the AI Gateway is responsible for the ingestion of session events, the correlation of the session ID with the session lifecycle, and the writes of session history into the session state store.
Lastly, the gateway normalizes and formats the responses from the Orchestration Engine to a consistent contract, which includes the schema of the response, citations, actions, safety flags, and telemetry hints. Such telemetry and audit events are streamed at every step of the interaction.
The AI Gateway should not be responsible for:
This delineation ensures a clean contract between the Experience Plane and the Runtime Plane, promoting modularity and separation of concerns.
Figure 2: AI Gateway Sequence Diagram
A request from the Experience Plane to the AI Gateway is a turn that performs the aforementioned checks, then calls the Orchestration Engine to execute a workflow. The Orchestration Engine kicks off deterministic context building, tooling interactions, and model inference through the Model Gateway.
The Orchestration Engine returns the structured response back to the AI Gateway, which streams (or passes, depending on the implementation) the response back to the Experience Plane. Once a response has been received, the AI Gateway writes audit events to the shared audit log in the runtime plane, as well as metrics and detailed traces to the telemetry store that is used by the enterprise’s chosen observability stack.
Example API contracts shown below are illustrative only and do not represent a production or deployed service.
POST /v1/ai/turn: Handles one conversational turn. Takes sessionId, surface, input, returns normalized AI response.POST /v1/ai/workflows/{workflowId}:run: Non-conversational. Run a workflow for a case/recordId with structured input.GET /v1/ai/sessions/{sessionId}: Returns session metadata and last known state, not raw sensitive content.POST /v1/ai/feedback: Captures thumbs up/down, corrections, reason codes for eval.And the response contract should always include:
All workflows, tools, audit events, and telemetry shown in the demo are simulated using mock data.
The demo is a simple React web app built with React Flow for the node-based diagrams and mock data to simulate the request lifecycle. No backend required.
A production Lattice implementation organizes services by responsibility. Here’s how the packages map to the architectural components:
services/packages % tree -L 1
.
├── ai-gateway # Runtime Plane: Front door, auth, session, telemetry
├── orchestrator # Runtime Plane: Workflow execution, agent runtime
├── context-builder # Runtime Plane: Retrieval, redaction, citations
├── tool-gateway # Runtime Plane: Governed tool execution
├── model-gateway # Runtime Plane: Model routing, cost controls
├── control-plane # Control Plane: Policy engine, registries
├── core # Shared types, clients, utilities
└── dashboard # Ops experience surface
The AI Gateway itself consists of middleware, routes, and services:
services/packages/ai-gateway % tree
.
├── Dockerfile
├── package.json
├── src
│ ├── index.ts
│ ├── middleware
│ │ ├── auth.ts # AuthN/AuthZ, identity extraction
│ │ ├── error.ts # Centralized error handling
│ │ └── telemetry.ts # Correlation IDs, tracing, metrics
│ ├── routes
│ │ ├── feedback.ts # POST /v1/ai/feedback
│ │ ├── health.ts # Health checks
│ │ ├── session.ts # GET /v1/ai/sessions/:id
│ │ ├── turn.ts # POST /v1/ai/turn
│ │ └── workflow.ts # POST /v1/ai/workflows/:id:run
│ └── services
│ ├── orchestrator-client.ts # Client for Orchestration Engine
│ └── session.ts # Session state management
└── tsconfig.json
The most important file in the AI Gateway is routes/turn.ts. This is the barebones of the architecture where a user’s input becomes an AI response.
// Illustrative pseudocode for a reference architecture (not production code).
// routes/turn.ts - The core request handler
turnRouter.post('/turn', async (c) => {
const correlationId = c.get('correlationId');
const identity = c.get('identity');
const startTime = Date.now();
// 1. Validate request against schema
const body = await c.req.json();
const request = TurnRequestSchema.parse(body);
// 2. Get or create session (for multi-turn conversations)
const session = await sessionManager.getOrCreate(
request.sessionId,
request.surface,
identity.id
);
// 3. Route to appropriate workflow based on surface + input
const workflowId = await routeToWorkflow(request.surface, request.input.text);
// 4. Execute workflow via orchestrator (THE HANDOFF)
const result = await orchestratorClient.executeWorkflow({
workflowId,
sessionId: session.sessionId,
correlationId,
input: {
userInput: request.input.text,
attachments: request.input.attachments,
surface: request.surface,
constraints: request.constraints
},
caller: {
userId: identity.id,
scopes: identity.scopes
}
});
// 5. Record turn in session history
await sessionManager.recordTurn(session.sessionId, {
role: 'user',
content: request.input.text,
timestamp: new Date().toISOString()
});
// 6. Build normalized response
const response: AIResponse = {
correlationId,
sessionId: session.sessionId,
runId: result.runId,
workflowId,
text: result.text,
citations: result.citations ?? [],
actions: result.actions ?? [],
safetyFlags: result.safetyFlags ?? [],
humanReviewRequired: result.humanReviewRequired ?? false,
telemetry: {
tokensUsed: result.tokensUsed ?? 0,
costUsd: result.costUsd ?? 0,
modelsUsed: result.modelsUsed ?? [],
toolsUsed: result.toolsUsed ?? []
},
latencyMs: Date.now() - startTime
};
return c.json(response);
});
This snippet is intentionally simplified to illustrate responsibilities and handoffs within the AI Gateway.
// services/orchestrator-client.ts - The handoff
async executeWorkflow(request: WorkflowExecuteRequest): Promise<WorkflowExecuteResult> {
const response = await fetch(`${ORCHESTRATOR_URL}/v1/execute`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Correlation-ID': request.correlationId
},
body: JSON.stringify({
workflowId: request.workflowId,
sessionId: request.sessionId,
input: request.input,
constraints: request.constraints
})
});
const data = await response.json();
return {
runId: data.runId,
text: data.text,
citations: data.citations ?? [],
// ... normalized from orchestrator response
};
}
The gateway doesn’t know or care what the orchestrator does internally. It could be a simple prompt, a multi-step agent, or a complex workflow with tool calls and human checkpoints. The contract is the same.
In the next post, we’ll deep dive into The Orchestration Engine and explore how it manages workflow execution deterministically, serving as the backbone of a governed AI platform.
This series will explore each component of the Lattice architecture in depth:
This series documents architectural patterns for enterprise AI platforms. Diagrams and frameworks are provided for educational purposes.