Async Job Architecture for Image Generation APIs
An image generation request exceeded the gateway timeout before the result was ready. The fix was moving generation behind an async job boundary with a queue, worker, job table, object storage, and WebSocket completion events.
Hello world, how’s everyone doing?
It’s been a while since my last post. My life got busy with positive few changes and transitions. Around this time, things were starting to feel a little lighter. Pandemic restrictions are easing all over the world, borders are reopening, and the world feels like it’s slowly trying to move again.
Anyway, let’s start.
The standard Nginx timeout was thirty seconds.
The image generation request took closer to forty-five.
The connection died before the image was ready.
I was experimenting with early diffusion-based image generation APIs. The output quality still had obvious problems: strange hands, uneven lighting, warped object edges, and that unnatural look you could spot immediately.
The more interesting issue was the backend shape. A prompt-to-image request doesn’t behave like a normal API call.
A normal endpoint usually does something bounded:
client
→ API
→ database or service call
→ JSON response
Image generation is slower and heavier. The user submits a short prompt, but the backend may need a GPU worker, object storage, job metadata, timeout handling, and a way to notify the frontend when the result is ready.
The direct request path failed
The first version used a direct request path:
client submits prompt
→ API accepts prompt
→ image generation starts
→ backend waits
→ image URL returns
That path failed once generation crossed the gateway timeout. The browser received an error while the server kept spending compute on a result the user would never see.
Polling was the obvious workaround, but I didn’t like the shape of it. Hitting the API every two seconds just to ask “ready yet?” creates noisy traffic and still leaves the system with awkward state handling.
I moved generation behind an async job boundary
I rebuilt the flow around an async job boundary.
New flow:
client submits prompt
→ API validates prompt
→ job ID is created
→ worker picks up generation job
→ image is written to object storage
→ job status updates
→ frontend receives completion event
→ image URL renders
The HTTP request created the job. The worker handled generation. Object storage held the result. WebSocket events handled completion.
The request stopped pretending the image would be ready inside one HTTP cycle.
The API returned quickly with a job ID. The expensive work moved to a background worker. The frontend kept a persistent WebSocket connection open for status updates. When the worker finished the diffusion process and wrote the image to storage, it published a completion event with the image URL.
That changed the service boundary.
- The API handled validation and job creation.
- The worker handled GPU-bound generation.
- Object storage held the final image.
- The WebSocket layer handled user-facing progress and completion.
The job record became the source of truth
The job record became the coordination point. It needed enough state to make the system recoverable:
- job ID
- user/session reference
- prompt hash or sanitized prompt reference
- status
- created time
- started time
- completed time
- failure reason
- storage URL
- model or generation version
The job table made slow generation recoverable outside the lifetime of a single HTTP request.
This made retries and failure handling cleaner. If a worker crashed, the job could stay marked as pending or failed instead of disappearing inside a dead HTTP request. If the frontend disconnected, the generation job could still finish and the result could still be retrieved later.
The WebSocket was there for the user experience, but the backend couldn’t depend on the socket as the only source of truth. The job table had to remain authoritative. A browser tab can close. A network can drop. A worker can finish after the client disappears.
That was the main architecture lesson.
The image generation model was slow, expensive, and visually imperfect, but the backend still needed normal production mechanics: job state, queue boundaries, object storage, retries, timeouts, and a way to reconnect without losing the result.
Result
After the change, the gateway timeout stopped being the control point for the whole feature. The HTTP request only created the job. The worker did the heavy compute. The frontend received progress and completion without hammering the API with polling.
The visual quality would obviously improve over time. The serving pattern mattered immediately.
Once an API starts synthesizing data instead of retrieving it, latency becomes part of the architecture. You need async jobs, background workers, persistent status updates, and storage-backed results. Otherwise, the first slow generation turns into a broken page and wasted compute.
Onto the next one. Let’s keep sharpening that edge.
First written on April 18, 2022.
Want to implement this architecture in your business?
Discuss Your Project