Building a Self-Filling Puzzle Pool: Server-Driven Image Generation with S3 Caching
We had a performance problem. Every time a kid tapped a puzzle theme, the app made a round-trip to generate an AI image — 17 seconds of staring at a loading screen. The same image, for the same puzzle, every time. We were burning Hugging Face credits regenerating images that already existed.
The fix: a server-side puzzle pool that generates once and caches forever.
The Old Architecture
The client did everything:
- Kid taps “Ice Kingdom”
- Client picks a random puzzle from 4 hardcoded options
- Client calls
POST /api/image/generatewith the prompt - Server proxies to AI service (FLUX.1-schnell on Hugging Face)
- 17 seconds later, client gets a base64 image
- Client slices it into puzzle pieces
Every session, every user, same 4 puzzles, same 17-second wait. The server had an in-memory cache that helped on repeat visits within the same server session, but a restart wiped it.
The New Architecture
Client: GET /api/puzzles/frozen/next?index=5
↓
Server: Pool has index 5? ──yes──→ Return S3 URL (5ms)
──no───→ Generate AI image (17s)
Upload to S3
Save to pool DB
Return S3 URL
The client doesn’t know about AI generation. It sends an index, gets back a URL. The server handles the rest.
The Pool Model
Each pool entry is a MongoDB document:
{
stationId: "frozen",
promptIndex: 5,
imageUrl: "https://aws-platform-puzzle-images.s3.amazonaws.com/puzzles/frozen/67d4...png"
}
The promptIndex maps to a prompt bank — 40 unique prompts per station (400 total across 10 themes). “Polar Bear Slide”, “Ice Palace Dawn”, “Penguin Family”, and so on. Each prompt gets a cute children's book illustration, vibrant colors, friendly, adorable suffix before hitting the AI model.
Self-Filling Pool
Here’s the part I like: the pool fills itself as kids play.
- First kid hits Ice Kingdom index 0 → pool miss → generates “Polar Bear Slide” → uploads to S3 → saves to DB → serves image (slow, ~20s)
- Second kid hits index 0 → pool hit → serves S3 URL (fast, ~5ms)
- Third kid completes the puzzle, advances to index 1 → pool miss → generates “Ice Palace Dawn”
- Every kid after gets index 1 instantly
No background jobs. No pre-warming scripts. The pool fills organically. After 40 plays of a station, every image is cached and every future request is instant.
Client-Side Index Tracking
The client tracks one number per station in localStorage:
{"frozen": 3, "rainbows": 1, "kpop": 0}
This index only advances when you complete a puzzle. Back out without finishing? You get the same puzzle next time. Hit “Next Puzzle” on the completion screen? The index advances immediately.
This is the entire client-side state for puzzle selection. No played-IDs, no dedup logic, no prompt bank. One number per station.
Why This Architecture
Three reasons:
1. Performance. Pool hits serve from S3 through CloudFront. Sub-100ms globally. No AI generation, no server compute.
2. Consistency. Every user sees the same image for index 5 of the frozen station. This matters for future features like leaderboards, shared progress, or “play the same puzzle as your friend.”
3. Offline support. This is the big one. Since puzzles are sequential and indexed, a React Native app can prefetch the next N images when on WiFi:
// While connected, download puzzles 5-14
for (let i = current; i < current + 10; i++) {
fetch(`/api/puzzles/frozen/next?index=${i}`)
.then(res => cacheToDevice(res.data.imageUrl))
}
When offline, intercept the request and serve from the device cache. The index-based design makes this trivial — you know exactly what to prefetch.
The Numbers
- 40 prompts per station, 10 stations = 400 unique puzzles
- Pool hit: ~5ms server, ~100ms with S3/CDN
- Pool miss: ~17s AI generation + ~2s S3 upload
- S3 storage: ~80KB per image, ~32MB for a full pool
- After 400 total plays across all stations, the pool is full and every request is instant
What’s Next
The pool is filling up as our first users play. We’re watching the MongoDB collection grow:
frozen: 10/40 cached
rainbows: 0/40
kpop: 0/40
...
Next up: React Native offline support using this same index-based architecture, and a monitoring alert for when HF credits run low (we learned that one the hard way).