Building a Self-Filling Puzzle Pool: Server-Driven Image Generation with S3 Caching


We had a performance problem. Every time a kid tapped a puzzle theme, the app made a round-trip to generate an AI image — 17 seconds of staring at a loading screen. The same image, for the same puzzle, every time. We were burning Hugging Face credits regenerating images that already existed.

The fix: a server-side puzzle pool that generates once and caches forever.

The Old Architecture

The client did everything:

  1. Kid taps “Ice Kingdom”
  2. Client picks a random puzzle from 4 hardcoded options
  3. Client calls POST /api/image/generate with the prompt
  4. Server proxies to AI service (FLUX.1-schnell on Hugging Face)
  5. 17 seconds later, client gets a base64 image
  6. Client slices it into puzzle pieces

Every session, every user, same 4 puzzles, same 17-second wait. The server had an in-memory cache that helped on repeat visits within the same server session, but a restart wiped it.

The New Architecture

Client: GET /api/puzzles/frozen/next?index=5

Server: Pool has index 5? ──yes──→ Return S3 URL (5ms)
                          ──no───→ Generate AI image (17s)
                                   Upload to S3
                                   Save to pool DB
                                   Return S3 URL

The client doesn’t know about AI generation. It sends an index, gets back a URL. The server handles the rest.

The Pool Model

Each pool entry is a MongoDB document:

{
  stationId: "frozen",
  promptIndex: 5,
  imageUrl: "https://aws-platform-puzzle-images.s3.amazonaws.com/puzzles/frozen/67d4...png"
}

The promptIndex maps to a prompt bank — 40 unique prompts per station (400 total across 10 themes). “Polar Bear Slide”, “Ice Palace Dawn”, “Penguin Family”, and so on. Each prompt gets a cute children's book illustration, vibrant colors, friendly, adorable suffix before hitting the AI model.

Self-Filling Pool

Here’s the part I like: the pool fills itself as kids play.

  • First kid hits Ice Kingdom index 0 → pool miss → generates “Polar Bear Slide” → uploads to S3 → saves to DB → serves image (slow, ~20s)
  • Second kid hits index 0 → pool hit → serves S3 URL (fast, ~5ms)
  • Third kid completes the puzzle, advances to index 1 → pool miss → generates “Ice Palace Dawn”
  • Every kid after gets index 1 instantly

No background jobs. No pre-warming scripts. The pool fills organically. After 40 plays of a station, every image is cached and every future request is instant.

Client-Side Index Tracking

The client tracks one number per station in localStorage:

{"frozen": 3, "rainbows": 1, "kpop": 0}

This index only advances when you complete a puzzle. Back out without finishing? You get the same puzzle next time. Hit “Next Puzzle” on the completion screen? The index advances immediately.

This is the entire client-side state for puzzle selection. No played-IDs, no dedup logic, no prompt bank. One number per station.

Why This Architecture

Three reasons:

1. Performance. Pool hits serve from S3 through CloudFront. Sub-100ms globally. No AI generation, no server compute.

2. Consistency. Every user sees the same image for index 5 of the frozen station. This matters for future features like leaderboards, shared progress, or “play the same puzzle as your friend.”

3. Offline support. This is the big one. Since puzzles are sequential and indexed, a React Native app can prefetch the next N images when on WiFi:

// While connected, download puzzles 5-14
for (let i = current; i < current + 10; i++) {
  fetch(`/api/puzzles/frozen/next?index=${i}`)
    .then(res => cacheToDevice(res.data.imageUrl))
}

When offline, intercept the request and serve from the device cache. The index-based design makes this trivial — you know exactly what to prefetch.

The Numbers

  • 40 prompts per station, 10 stations = 400 unique puzzles
  • Pool hit: ~5ms server, ~100ms with S3/CDN
  • Pool miss: ~17s AI generation + ~2s S3 upload
  • S3 storage: ~80KB per image, ~32MB for a full pool
  • After 400 total plays across all stations, the pool is full and every request is instant

What’s Next

The pool is filling up as our first users play. We’re watching the MongoDB collection grow:

frozen: 10/40 cached
rainbows: 0/40
kpop: 0/40
...

Next up: React Native offline support using this same index-based architecture, and a monitoring alert for when HF credits run low (we learned that one the hard way).