Workflow SystemsEngineering Log February 15, 2024 8 min read

Orchestrating Multi-Agent Workflows Without Letting Them Run the System

A multi-agent research workflow burned through paid compute because the agents owned the loop. The fix was a controller-owned state machine with typed outputs, bounded retries, cost tracking, and audit logs.

Agent WorkflowsBackend ArchitectureState MachinesCost ControlWorkflow ReliabilityStructured OutputsObservability

Agent frameworks are getting louder right now.

The demos look impressive: Researcher, Writer, Reviewer, Planner, Critic. Give them a goal, let them talk, and watch a report appear.

I get the appeal. It feels close to how people imagine knowledge work happens.

The problem is that software doesn’t get safer when you replace control flow with a group chat.

The API bill hit $45 in twelve minutes for one background task.

I was testing a market research pipeline. The workflow looked simple enough:

Researcher gathers data
→ Writer drafts the report
→ Reviewer checks the report
→ Writer revises if needed
→ final report gets saved

The framework made each step feel like a role in a conversation. The Researcher could message the Writer. The Writer could message the Reviewer. The Reviewer could ask for revisions. The system would keep going until the agents agreed the task was complete.

That last part was the failure.

No exception.
No crash.
No obvious error.

Just tokens burning inside a loop with no hard stop.

The first architecture was too soft

The first architecture looked like this:

goal
→ Researcher agent
→ Writer agent
→ Reviewer agent
→ natural-language revision loop
→ maybe final output

That shape is too soft for paid compute.

A background workflow needs a controller. It needs state. It needs max attempts. It needs exit conditions that aren’t negotiated in prose.

So I removed the conversational orchestration and rebuilt the flow around a finite state machine.

The agents stopped talking to each other directly.

Each step became a bounded transformation over a shared state object.

Researcher
  input: task brief
  output: structured research_items[]

Writer
  input: research_items[]
  output: draft_report

Reviewer
  input: draft_report + rubric
  output: is_approved + required_fixes[]

Multi-agent workflow architecture showing a controller, typed state object, bounded agent steps, retry gate, max-attempt guard, and final report output. — Controller-owned multi-agent workflow

The controller owns the workflow

The controller owned the workflow.

The agents only owned their assigned transformation.

That changed the system immediately. The Researcher did not decide who speaks next. It produced structured research data and stopped. The Writer did not debate the Reviewer. It received research data and generated a draft. The Reviewer did not write a long critique unless the schema allowed it.

It returned a strict result:

{
  "is_approved": false,
  "required_fixes": [
    {
      "section": "pricing",
      "issue": "missing source comparison",
      "required_change": "add competitor pricing range"
    }
  ]
}

Pydantic state object for a multi-agent research workflow with task metadata, research items, draft report, review result, attempt count, status, and failure reason. — Typed workflow state

Routing became normal backend logic

Once the output had structure, routing became normal backend logic.

If research_items exists, run the Writer.
If draft_report exists, run the Reviewer.
If is_approved is true, save the report.
If is_approved is false and attempts remain, send only the required fixes back to the Writer.
If attempts are exhausted, stop and mark the job for human review.

No agent got to decide whether the loop should continue.

The retry rule lived in code:

if review.is_approved:
    status = "approved"

elif state.attempt_count < MAX_REVISIONS:
    status = "needs_revision"

else:
    status = "manual_review_required"

Deterministic Python router that reads typed workflow state, advances the next step, applies max revision limits, and stops failed jobs cleanly. — Deterministic workflow router

Cost controls were boring and necessary

The cost controls were boring and necessary:

max revision count
max tokens per step
timeout per agent call
structured JSON outputs
validation failure retries
hard stop on invalid state
job-level cost tracking
audit log per transition

The audit log mattered more than I expected. Without it, a failed agent workflow is hard to inspect because the conversation looks busy even when the system is doing nothing useful.

I started logging each transition:

job_id
previous_state
next_state
agent_name
input_token_count
output_token_count
cost_estimate
latency_ms
validation_result
failure_reason

Workflow transition logger that records state changes, token usage, latency, validation result, and cost per agent step. — Workflow transition logging

That made failures easier to understand.

If the Researcher returned weak data, the log showed it. If the Writer ignored required fields, schema validation caught it. If the Reviewer kept rejecting the same section, the retry counter stopped the loop and preserved the report for manual review.

The pipeline became less magical and more useful

The pipeline became less magical and more useful.

The agents still used language models. They still handled messy tasks like extraction, synthesis, and review. But the workflow itself became deterministic. The system no longer depended on natural language agreement to decide when paid compute should stop.

That distinction matters.

LLMs are useful inside workflow steps. They are bad as the owner of the workflow boundary.

For a market research pipeline, I want the model to read messy inputs, summarize findings, draft sections, and detect gaps. I don’t want it deciding how many times the job should loop, whether the budget is acceptable, or whether the final state has been reached.

That belongs to the controller.

Result

After the rebuild, the same task completed for cents instead of burning dollars in an open-ended revision loop. More importantly, the output became inspectable. Every step had input, output, schema, status, cost, and failure reason.

This is where I think agent systems need to mature.

The useful version isn’t a room full of chatbots talking until they feel done. It’s a controlled workflow with typed state, bounded retries, narrow model responsibilities, and backend-owned routing.

The models can do the judgment-heavy parts.

The system still needs to own the job.

Onto the next one. Let’s keep sharpening that edge.

First written on February 15, 2024.

Want to implement this architecture in your business?

Discuss Your Project