Orchestrating Multi-Agent Workflows Without Letting Them Run the System
A multi-agent research workflow burned through paid compute because the agents owned the loop. The fix was a controller-owned state machine with typed outputs, bounded retries, cost tracking, and audit logs.
Agent frameworks are getting louder right now.
The demos look impressive: Researcher, Writer, Reviewer, Planner, Critic. Give them a goal, let them talk, and watch a report appear.
I get the appeal. It feels close to how people imagine knowledge work happens.
The problem is that software doesn’t get safer when you replace control flow with a group chat.
The API bill hit $45 in twelve minutes for one background task.
I was testing a market research pipeline. The workflow looked simple enough:
Researcher gathers data
→ Writer drafts the report
→ Reviewer checks the report
→ Writer revises if needed
→ final report gets saved
The framework made each step feel like a role in a conversation. The Researcher could message the Writer. The Writer could message the Reviewer. The Reviewer could ask for revisions. The system would keep going until the agents agreed the task was complete.
That last part was the failure.
No exception.
No crash.
No obvious error.
Just tokens burning inside a loop with no hard stop.
The first architecture was too soft
The first architecture looked like this:
goal
→ Researcher agent
→ Writer agent
→ Reviewer agent
→ natural-language revision loop
→ maybe final output
That shape is too soft for paid compute.
A background workflow needs a controller. It needs state. It needs max attempts. It needs exit conditions that aren’t negotiated in prose.
So I removed the conversational orchestration and rebuilt the flow around a finite state machine.
The agents stopped talking to each other directly.
Each step became a bounded transformation over a shared state object.
Researcher
input: task brief
output: structured research_items[]
Writer
input: research_items[]
output: draft_report
Reviewer
input: draft_report + rubric
output: is_approved + required_fixes[]
The agents transform state. The controller owns routing, retries, failure handling, and final job status.
The controller owns the workflow
The controller owned the workflow.
The agents only owned their assigned transformation.
That changed the system immediately. The Researcher did not decide who speaks next. It produced structured research data and stopped. The Writer did not debate the Reviewer. It received research data and generated a draft. The Reviewer did not write a long critique unless the schema allowed it.
It returned a strict result:
{
"is_approved": false,
"required_fixes": [
{
"section": "pricing",
"issue": "missing source comparison",
"required_change": "add competitor pricing range"
}
]
}
The workflow state made each agent output inspectable, validatable, and safe to route through backend logic.
Routing became normal backend logic
Once the output had structure, routing became normal backend logic.
If research_items exists, run the Writer.
If draft_report exists, run the Reviewer.
If is_approved is true, save the report.
If is_approved is false and attempts remain, send only the required fixes back to the Writer.
If attempts are exhausted, stop and mark the job for human review.
No agent got to decide whether the loop should continue.
The retry rule lived in code:
if review.is_approved:
status = "approved"
elif state.attempt_count < MAX_REVISIONS:
status = "needs_revision"
else:
status = "manual_review_required"
The router reads state, applies bounded retry rules, and stops jobs cleanly instead of letting agents negotiate the loop.
Cost controls were boring and necessary
The cost controls were boring and necessary:
- max revision count
- max tokens per step
- timeout per agent call
- structured JSON outputs
- validation failure retries
- hard stop on invalid state
- job-level cost tracking
- audit log per transition
The audit log mattered more than I expected. Without it, a failed agent workflow is hard to inspect because the conversation looks busy even when the system is doing nothing useful.
I started logging each transition:
job_id
previous_state
next_state
agent_name
input_token_count
output_token_count
cost_estimate
latency_ms
validation_result
failure_reason
Each transition recorded cost, latency, validation status, and failure reason so the workflow could be debugged like a backend job.
That made failures easier to understand.
If the Researcher returned weak data, the log showed it. If the Writer ignored required fields, schema validation caught it. If the Reviewer kept rejecting the same section, the retry counter stopped the loop and preserved the report for manual review.
The pipeline became less magical and more useful
The pipeline became less magical and more useful.
The agents still used language models. They still handled messy tasks like extraction, synthesis, and review. But the workflow itself became deterministic. The system no longer depended on natural language agreement to decide when paid compute should stop.
That distinction matters.
LLMs are useful inside workflow steps. They are bad as the owner of the workflow boundary.
For a market research pipeline, I want the model to read messy inputs, summarize findings, draft sections, and detect gaps. I don’t want it deciding how many times the job should loop, whether the budget is acceptable, or whether the final state has been reached.
That belongs to the controller.
Result
After the rebuild, the same task completed for cents instead of burning dollars in an open-ended revision loop. More importantly, the output became inspectable. Every step had input, output, schema, status, cost, and failure reason.
This is where I think agent systems need to mature.
The useful version isn’t a room full of chatbots talking until they feel done. It’s a controlled workflow with typed state, bounded retries, narrow model responsibilities, and backend-owned routing.
The models can do the judgment-heavy parts.
The system still needs to own the job.
Onto the next one. Let’s keep sharpening that edge.
First written on February 15, 2024.
Want to implement this architecture in your business?
Discuss Your Project