SeanMiller
Enterprise AI Platforms

Architecting Enterprise AI: The Lattice Platform (Part 3)

Sean Miller
#blog#series#architecture#enterprise ai#platform engineering#lattice

Lattice abstract representation of an Enterprise AI Platform

Disclaimer: This series is a personal, educational reference architecture. All diagrams, opinions, and frameworks are my own and are not affiliated with, sponsored by, or representative of my employer. I’m publishing it on my own time and without using any confidential information.

© 2026 Sean Miller. All rights reserved.


From Gateway to Engine

The AI Gateway is the front door. It authenticates, validates, routes, and normalizes. But it doesn’t execute anything intelligent. Once the gateway accepts a request and selects a workflow, it hands off to the Orchestration Engine: the runtime that turns intent into action.

If the AI Gateway is the front door, the Orchestration Engine is the factory floor. It runs versioned, approved graphs of tool calls, model calls, and human checkpoints with persisted state and step-level auditability. Every run is governed, replayable, and testable.

Why Orchestration Matters

Ad hoc “agents” typically fail in three ways.

First, they lack determinism around non-determinism. When your agent decides what to do next, there’s no contract for what steps are allowed, what state must be preserved, or how to recover when something fails. The LLM just does whatever it does.

Second, they lack governed tool use. Tools get invoked without allowlists, schemas, retries, or idempotency guarantees. One hallucinated function call can corrupt production data or trigger an irreversible action.

Third, they lack replayability. When something goes wrong, there’s no step-level log showing what context was consumed, what decisions were made, or what outputs were produced. Debugging becomes guesswork. Auditing becomes impossible.

The Orchestration Engine solves all three problems by treating agentic behavior as workflow execution with explicit contracts at every boundary.

Orchestration Engine Responsibilities

Orchestration Engine Component Diagram Figure 1: Orchestration Engine Component Diagram

The Orchestration Engine owns workflow execution: running graphs and steps, handling branching, timeouts, retries, and fallbacks. It maintains run state and step state, including references to intermediate artifacts. The engine enforces tool invocation discipline through schema validation, idempotency keys, rate limits, and safe retries. It applies model invocation discipline via routing hints, structured outputs, and stop conditions. It manages human-in-the-loop checkpoints by pausing execution, requesting approval, and resuming when granted. And it ensures traceability through step-level logs, causal chains, and replay support.

The engine also enforces runtime policy: tool allowlists, data scope constraints, and context budgets. Every decision passes through policy before execution.

Orchestration Engine Explicit Non-Responsibilities

The Orchestration Engine should not:

This separation keeps the engine focused on what it does well: reliable, auditable execution of governed workflows.


Inside the Engine

When people say “agentic,” you can map it to these internal components:

ComponentResponsibility
Workflow ExecutorRuns the graph or state machine
PlannerDecides next steps when the workflow allows flexibility (optional)
Tool DispatcherExecutes tool calls through the Tool Gateway with constraints
Model InvokerCalls the Model Gateway and enforces output schemas
State Store AdapterPersists run state to the Session Store or durable store
Checkpoint ManagerHandles HITL pauses and resumes
Trace EmitterEmits structured run and step events to audit and telemetry

Each component has a clear boundary. The Workflow Executor coordinates. The Planner advises. The dispatchers and invokers execute under strict contracts. The adapters persist. The emitters observe.

Workflow Definition

A workflow is a versioned, approved state machine with explicit schemas and policies. Each workflow defines:

This is the core architectural move: agents become workflows with guardrails. The workflow definition is code-reviewed, version-controlled, and deployed through your standard CI/CD pipeline. No more ad hoc prompt chains that drift between deploys.

# Example workflow definition (illustrative)
workflow:
  id: case-risk-summary
  version: "1.2.0"
  owner: processing-team

  steps:
    - id: gather-context
      type: tool
      tool: document-retriever
      input_schema: { caseId: string }
      output_schema: { documents: Document[] }

    - id: analyze-risk
      type: model
      model_class: high-accuracy
      input_schema: { documents: Document[], prompt: string }
      output_schema: { riskFactors: RiskFactor[], summary: string }
      constraints:
        max_tokens: 4000
        required_citations: true

    - id: human-review
      type: checkpoint
      trigger: { riskLevel: "high" }
      timeout_hours: 24
      escalation: processing-lead

  policies:
    allowed_tools: [document-retriever, policy-lookup]
    data_scope: [case-documents, policy-rules]
    max_cost_usd: 0.50
    max_duration_minutes: 5

The workflow definition is declarative. The Orchestration Engine interprets it. This separation means you can update policy constraints without touching the execution logic.


The Planner

Not all workflows are linear. Sometimes the next step depends on context: the user asks something unexpected, new documents arrive mid-case, a tool call fails and needs a fallback, or the system must choose between summarizing and drilling down based on signal.

The Planner is an optional decision-maker that chooses which step to run next when the workflow permits flexibility. The safest design treats it as a recommendation engine constrained by policy.

Planner Principle

The planner suggests; the executor decides. Planning is constrained to selecting among enumerated, approved next steps, and the executor validates every decision against policy and budgets before execution.

Given:

The planner returns a plan decision:

The Workflow Executor then validates that decision against guardrails before proceeding.

Planner Types

Different scenarios call for different planner implementations:

Deterministic Planner (No AI): Best for strict compliance flows. The “plan” is if/else logic encoded in rules. No model involved. Maximum predictability.

Hybrid Planner (Rules First, Model for Ambiguity): Rules handle 80-90% of routing. The model is used for classification, intent extraction, or tie-breaking. This balances predictability with flexibility.

LLM Planner (Model Chooses the Path): The model selects among enumerated next steps. It must output strict JSON, cannot call tools directly, and can be forced to provide confidence and reasoning. Low confidence triggers HITL or a safer fallback path.

Planner Sequence Diagram Figure 2: Planner Decision Point Sequence

Planner Constraints

The Planner can only choose from approved steps in the workflow graph. It outputs structured JSON that gets schema-validated. The Executor enforces budgets and tool allowlists before execution. Planner decisions are logged to audit and replayable. Low confidence or high-risk routes trigger human-in-the-loop review.

These constraints prevent the failure modes that plague unconstrained agents. The Planner is powerful because it’s limited.


Processing a Workflow Run

Orchestration Engine Sequence Diagram Figure 3: Orchestration Engine Sequence Diagram with branching and HITL

A typical workflow run proceeds through several phases.

Initialization: The Workflow Executor receives the execution request from the AI Gateway, loads the workflow definition from the Workflow Registry, initializes run state, and begins at the entry step.

Step Execution: For each step, the executor checks policy constraints, invokes the appropriate dispatcher (tool, model, or checkpoint), captures the output, updates state, and emits trace events.

Branching: When the workflow reaches a decision point, the Planner evaluates the current context and selects the next step from the allowed options. The executor validates this selection against policy before proceeding.

HITL Checkpoints: When a step triggers human review, the Checkpoint Manager pauses execution, persists state, notifies the appropriate approver, and waits for a decision. On approval, execution resumes. On rejection or timeout, the workflow follows the defined escalation path.

Completion: When the workflow reaches a terminal state, the executor finalizes the run, emits completion events, and returns the structured result to the AI Gateway.


Anatomy of Workflow Execution

The orchestrator client call from the AI Gateway triggers execution. Here’s what happens inside:

// Illustrative pseudocode for a reference architecture (not production code)
// orchestrator/src/executor.ts

async function executeWorkflow(request: WorkflowExecuteRequest): Promise<WorkflowExecuteResult> {
    const { workflowId, sessionId, correlationId, input, caller } = request;

    // 1. Load workflow definition from registry
    const workflow = await workflowRegistry.get(workflowId);
    if (!workflow) {
        throw new WorkflowNotFoundError(workflowId);
    }

    // 2. Initialize run state
    const runId = generateRunId();
    const runState: RunState = {
        runId,
        workflowId,
        workflowVersion: workflow.version,
        sessionId,
        correlationId,
        currentStepId: workflow.entryStep,
        stepHistory: [],
        artifacts: {},
        startedAt: new Date().toISOString(),
        status: 'running'
    };

    await stateStore.saveRunState(runState);
    traceEmitter.emit('run.started', { runId, workflowId, correlationId });

    // 3. Execute steps until terminal state
    while (!isTerminalStep(runState.currentStepId, workflow)) {
        const step = workflow.steps[runState.currentStepId];

        // 4. Check policy before each step
        const policyCheck = await policyEngine.checkStepAllowed({
            workflowId,
            stepId: step.id,
            caller,
            currentBudget: runState.budgetConsumed
        });

        if (!policyCheck.allowed) {
            traceEmitter.emit('step.blocked', { runId, stepId: step.id, reason: policyCheck.reason });
            throw new PolicyViolationError(policyCheck.reason);
        }

        // 5. Execute the step based on type
        const stepResult = await executeStep(step, runState, input, caller);

        // 6. Update state and history
        runState.stepHistory.push({
            stepId: step.id,
            startedAt: stepResult.startedAt,
            completedAt: stepResult.completedAt,
            output: stepResult.output,
            tokensUsed: stepResult.tokensUsed,
            costUsd: stepResult.costUsd
        });

        runState.artifacts = { ...runState.artifacts, ...stepResult.artifacts };
        runState.budgetConsumed = (runState.budgetConsumed ?? 0) + (stepResult.costUsd ?? 0);

        // 7. Determine next step (may involve Planner)
        runState.currentStepId = await determineNextStep(step, stepResult, workflow, runState);

        await stateStore.saveRunState(runState);
        traceEmitter.emit('step.completed', { runId, stepId: step.id, nextStepId: runState.currentStepId });
    }

    // 8. Finalize and return
    runState.status = 'completed';
    runState.completedAt = new Date().toISOString();
    await stateStore.saveRunState(runState);
    traceEmitter.emit('run.completed', { runId, workflowId });

    return buildWorkflowResult(runState);
}

The step execution dispatcher routes to the appropriate handler:

// orchestrator/src/step-dispatcher.ts

async function executeStep(
    step: WorkflowStep,
    runState: RunState,
    input: WorkflowInput,
    caller: CallerIdentity
): Promise<StepResult> {
    const startedAt = new Date().toISOString();

    switch (step.type) {
        case 'tool':
            return await toolDispatcher.execute({
                toolId: step.tool,
                input: resolveStepInput(step.input_schema, runState, input),
                caller,
                idempotencyKey: `${runState.runId}-${step.id}`,
                constraints: step.constraints
            });

        case 'model':
            return await modelInvoker.execute({
                modelClass: step.model_class,
                input: resolveStepInput(step.input_schema, runState, input),
                outputSchema: step.output_schema,
                constraints: step.constraints
            });

        case 'checkpoint':
            return await checkpointManager.requestApproval({
                runId: runState.runId,
                stepId: step.id,
                trigger: step.trigger,
                timeout: step.timeout_hours,
                escalation: step.escalation,
                context: buildCheckpointContext(runState)
            });

        case 'rule':
            return await ruleEngine.evaluate({
                ruleId: step.rule,
                input: resolveStepInput(step.input_schema, runState, input)
            });

        default:
            throw new UnknownStepTypeError(step.type);
    }
}

The Planner integration for dynamic routing:

// orchestrator/src/planner.ts

async function determineNextStep(
    currentStep: WorkflowStep,
    stepResult: StepResult,
    workflow: WorkflowDefinition,
    runState: RunState
): Promise<string> {
    // Static routing: explicit next step defined
    if (currentStep.next) {
        return currentStep.next;
    }

    // Conditional routing: evaluate conditions
    if (currentStep.transitions) {
        for (const transition of currentStep.transitions) {
            if (evaluateCondition(transition.condition, stepResult, runState)) {
                return transition.target;
            }
        }
    }

    // Dynamic routing: invoke Planner
    if (currentStep.allowDynamicRouting) {
        const plannerInput: PlannerInput = {
            allowedNextSteps: currentStep.allowedNextSteps,
            currentStateSummary: summarizeState(runState),
            lastStepOutput: stepResult.output,
            constraints: {
                maxToolCallsRemaining: workflow.policies.max_tool_calls - runState.toolCallCount,
                maxCostRemainingUsd: workflow.policies.max_cost_usd - runState.budgetConsumed,
                requiredApproval: workflow.policies.hitl_required
            }
        };

        const plannerOutput = await planner.decide(plannerInput);

        // Validate planner decision
        if (!currentStep.allowedNextSteps.includes(plannerOutput.selectedStepId)) {
            traceEmitter.emit('planner.invalid', {
                runId: runState.runId,
                selected: plannerOutput.selectedStepId,
                allowed: currentStep.allowedNextSteps
            });
            throw new InvalidPlannerDecisionError(plannerOutput.selectedStepId);
        }

        // Trigger HITL if low confidence or high risk
        if (plannerOutput.confidence < 0.7 || plannerOutput.riskLevel === 'high') {
            traceEmitter.emit('planner.hitl-triggered', { runId: runState.runId, reason: 'low-confidence' });
            return 'human-review-step';
        }

        return plannerOutput.selectedStepId;
    }

    // Terminal: no next step
    return 'terminal';
}

This code is intentionally simplified. Production implementations handle concurrency, distributed state, failure recovery, and many edge cases the pseudocode omits. The structure illustrates the separation of concerns and the flow of control.


Observability and Replay

Every step emits structured trace events. These events flow to the shared audit log and telemetry store, enabling:

Debugging: Trace a failed run step by step. See exactly what context was assembled, what the model received, what it produced, and why the workflow branched where it did.

Auditing: Answer “why did the system do X?” with evidence. Every decision is logged with causal links. Regulators and auditors can follow the chain.

Regression Testing: Replay historical runs against new workflow versions. Catch behavioral changes before they reach production.

Performance Analysis: Identify slow steps, expensive model calls, and frequent HITL triggers. Optimize based on data.

The Trace Emitter produces events like:

{
  "eventType": "step.completed",
  "runId": "run_abc123",
  "correlationId": "corr_xyz789",
  "workflowId": "case-risk-summary",
  "workflowVersion": "1.2.0",
  "stepId": "analyze-risk",
  "stepType": "model",
  "startedAt": "2026-01-09T14:23:45.123Z",
  "completedAt": "2026-01-09T14:23:47.456Z",
  "durationMs": 2333,
  "tokensUsed": 1847,
  "costUsd": 0.037,
  "modelUsed": "gemini-2.5-flash",
  "outputSummary": "Identified 3 risk factors: documentation gap, income verification, DTI threshold",
  "nextStepId": "human-review"
}

These events are the foundation for observability dashboards, alerting, and continuous improvement.


What’s Next

In the next post, we’ll explore The Tool Gateway: the governed execution boundary between AI workflows and enterprise systems. We’ll examine how tools are registered, how RBAC and policy enforcement work at the tool level, and why deterministic tool behavior is essential for reliable AI systems.


Series Roadmap

This series will explore each component of the Lattice architecture in depth:

  1. Introduction to Lattice — The Five Planes overview
  2. The AI Gateway — Front door and policy enforcement
  3. The Orchestration Engine (this post) — Workflows, not agents
  4. The Tool Gateway — Governed access to enterprise systems
  5. The Context Builder — Retrieval, redaction, and grounding
  6. The Model Gateway — Routing, cost control, and structured outputs
  7. The Control Plane — Policy, registries, and change management
  8. The Data Plane — Indexes, stores, and session state
  9. The Ingestion Plane — Document processing and embeddings
  10. MCP Integration — Standardized interoperability
  11. Preventing Hallucinations — Architectural approaches to grounding
  12. Lattice-Lite — A lighter approach for small orgs
  13. Putting It Together — End-to-end request lifecycle

This series documents architectural patterns for enterprise AI platforms. Diagrams and frameworks are provided for educational purposes.

← Back to Blog