AI SystemsEngineering Log November 30, 2022 7 min read

The ChatGPT Shift: Conversational State as a Backend Problem

A chat app hit the context limit because the frontend kept sending the full conversation back to the model. The fix was treating conversation history as backend-owned structured state with sessions, turns, summaries, archives, and bounded prompt assembly.

ChatGPTConversation StatePostgreSQLPrompt AssemblyToken BudgetsBackend ArchitectureAI SystemsCost Control

OpenAI launched ChatGPT today, and the reaction is loud for obvious reasons.

The interface feels different. People aren’t just sending one-off prompts anymore. They’re asking follow-up questions, correcting the model, refining answers, and expecting the system to remember the shape of the conversation.

That changes the backend problem.

The API threw a maximum token limit error on turn fifteen.

The chat still looked fine in the browser. The user could scroll through the conversation. The frontend had every message. The failure happened when the app tried to send the entire history back into the model.

Early chat interfaces are using a very simple pattern:

user sends message
→ frontend appends it to chat history
→ full conversation gets sent to the backend
→ backend sends the full prompt to the model
→ model returns the next response

That works for a short demo. It fails once the conversation becomes real.

A conversation grows every turn. The prompt gets heavier. Latency goes up. Cost goes up. Eventually the payload hits the context window and the model rejects it.

The mistake is treating conversation history like one long text file.

For a backend system, it should be structured state.

Conversation history became database state

I changed the pilot schema around two tables:

sessions
turns

The session stores the user/session reference, current summary, active token estimate, status, and timestamps. The turns table stores each user and assistant message as its own row, tied back to the session.

PostgreSQL schema for sessions and turns, including role, content, token estimate, archived flag, summary pointer, and timestamps. — PostgreSQL sessions and turns schema

That gave the backend something useful to work with. It could count, trim, summarize, archive, and reconstruct context without trusting the frontend to send the correct history.

The backend assembled bounded context

New request flow:

client sends latest message
→ API loads session
→ API writes the new turn
→ worker checks active token count
→ older turns are summarized when needed
→ backend assembles bounded context
→ model receives summary plus recent turns
→ response is stored as a new turn

Chat session architecture showing API, PostgreSQL sessions and turns tables, token-count worker, summary state, bounded prompt assembly, and model call. — Backend-owned chat session architecture

The active prompt stopped being “everything the user has ever said.”

It became:

system instructions
→ compressed session summary
→ recent relevant turns
→ latest user message

That structure gave the system a ceiling. The model still received the useful shape of the conversation, but the backend controlled how much raw history entered the request.

The summary worker controlled the hot path

The summary worker handled the cleanup.

When the active token estimate crossed a threshold, it selected older turns, summarized them into a smaller state block, stored that summary on the session, and marked the raw turns as archived.

Background worker that monitors token count, summarizes older turns, updates the session summary, and marks old turns as archived. — Summary worker

The raw turns were still stored for audit and debugging. They just weren’t blindly sent into every model call.

That separation mattered.

The database kept the full history.
The prompt only carried what the model needed right now.

It also made cost easier to reason about. Without this boundary, one active user can make every later request more expensive than the last. A chat session becomes a growing payload. With bounded context assembly, each request has a predictable upper range.

The backend contract became clearer

The backend contract became clearer:

the client sends only the latest message
PostgreSQL stores the conversation as structured turns
the worker maintains compressed memory
the backend assembles the prompt
the model receives bounded context
archived turns stay available outside the hot path

This is the part I keep coming back to with language models. The interface looks conversational, but the system underneath still needs normal data engineering.

State has to live somewhere.
Context has to be selected.
History has to be compressed.
Costs have to be bounded.
Failures have to be recoverable.

Result

ChatGPT makes the user experience feel obvious. The backend work behind that experience is still wide open.

If chat becomes the default interface for software, conversation history can’t stay as frontend string concatenation. It needs schemas, workers, summaries, archive rules, token budgets, and prompt assembly controlled by the server.

That is the real systems work starting to show up now.

Onto the next one. Let’s keep sharpening that edge.

First written on November 30, 2022.

Want to implement this architecture in your business?

Discuss Your Project