InfrastructureEngineering Log December 02, 2021 7 min read

Streaming Architecture for Sensitive Text Generation Workloads

A text generation endpoint hit a gateway timeout while the instance kept burning compute. The fix was a Server-Sent Events lifecycle with validation, controlled streaming, disconnect cancellation, frontend buffering, and safe operational logging.

StreamingServer-Sent EventsText GenerationFastAPICancellationSafe LoggingBackend ArchitectureReliability

The first failure was a thirty-second timeout.

A user submitted a prompt for a generated product description. The backend accepted it, passed it into the generation service, and waited for the complete text before returning a response.

Nginx closed the connection before the model finished.

The frontend showed a 504 Gateway Timeout. The instance kept generating in the background. That meant compute was still running after the user had already received a failed page.

For a normal classification API, the old request path was fine:

client
→ API
→ validation
→ inference
→ JSON response

Those requests were short and predictable. Text generation had a wider runtime range. A short output could finish quickly. A longer one could cross gateway, browser, or app-server timeout limits.

The issue became sharper because the prompts and outputs could contain sensitive commercial data: product details, internal notes, pricing language, supplier terms, and draft copy that wasn’t meant to leak into logs or failed retry paths.

So the architecture had to handle three things at the same time:

long-running generation
safe cancellation
controlled visibility of partial output

The endpoint became a stream

I moved the endpoint away from a single blocking response and rebuilt it around Server-Sent Events.

New flow:

client submits prompt
→ API validates and redacts unsafe fields
→ generation worker starts
→ backend streams chunks
→ frontend buffers and renders partial output
→ stream closes, fails, or cancels cleanly

Server-Sent Events text generation flow with API validation, generation worker, streamed chunks, frontend buffer, disconnect handling, and cancellation path. — SSE text generation flow

The API contract changed from “return the final text” to “manage a stream.”

That forced a clearer lifecycle:

request accepted
generation started
chunk emitted
client disconnected
generation cancelled
generation completed
error returned without dumping sensitive payloads

Cancellation became part of the backend contract

The cancellation path was the main backend fix. If the client disconnected, the worker checked the connection state and stopped generating. That prevented the instance from burning compute on a response nobody would receive.

The frontend also needed a safer update path. Streaming many small chunks directly into React state created too many renders. I buffered chunks and rendered at a controlled interval, so the user saw progress quickly without turning the browser into part of the bottleneck.

FastAPI Server-Sent Events endpoint with request validation, async chunk generator, disconnect check, and safe error event. — FastAPI SSE endpoint

Logs kept operational shape, not sensitive text

The logs had to change too. The service could not casually log full prompts or generated text. I kept operational fields instead:

request ID
prompt size
output token count
generation duration
cancellation reason
model version
validation result
timeout source

That gave enough visibility to debug the system without storing sensitive content in application logs.

Result

After the change, users started seeing output almost immediately. Long generations no longer died as silent 504 failures. Disconnected clients stopped consuming worker time. Failed requests returned controlled stream events instead of broken pages.

The model didn’t need to be treated as a special magical component. It behaved like any other slow backend workload: validate the input, control the request lifecycle, stream progress, cancel abandoned work, avoid leaking sensitive data, and log the operational shape without storing the payload.

For text generation, the serving architecture matters as much as the model call. A normal REST endpoint can hide too much: timeouts, wasted compute, unsafe logs, bad retries, and unclear failure states.

The streaming boundary made those failure modes visible enough to manage.

Onto the next one. Let’s keep sharpening that edge.

First written on December 02, 2021.

Want to implement this architecture in your business?

Discuss Your Project