Mar 18, 2026

Better Puzzle Images: Switching to FLUX and Theme-Aware Generation

The puzzle images weren’t good enough. Washed-out colors, subjects floating in plain gray backgrounds, no personality. A snowman puzzle looked like a clipart asset. A helicopter puzzle had a subject so small that half the tiles were just empty sky.

The fix turned out to be two things: a better model and smarter prompts.

What Was Wrong

Our image generation pipeline was using stabilityai/stable-diffusion-xl-base-1.0 — the base model, not even the refiner — with generic prompts like:

"a cheerful snowman with a carrot nose, cute children's book illustration, vibrant colors"

No negative prompts. No composition hints. No awareness of which theme the puzzle belonged to. Every image came out the same way: small subject, large plain background, muted palette.

The irony: the carousel art (the theme covers and full-screen backgrounds that look great) was generated with the same model but much richer, more detailed prompts. The gap wasn’t the model — it was the prompt quality.

FLUX.1-schnell: Same API, Much Better Output

FLUX.1-schnell from Black Forest Labs is the latest generation of open image models. It’s available on the Hugging Face Inference API — same endpoint, just a different model ID:

// Before
const DEFAULT_MODEL = "stabilityai/stable-diffusion-xl-base-1.0";

// After
const DEFAULT_MODEL = "black-forest-labs/FLUX.1-schnell";

The “schnell” variant is distilled for speed — it produces great results in just 4 steps instead of 30, which means faster generation and lower API usage. The parameters change too:

// SDXL (before)
parameters: {
  guidance_scale: 7.5,
  num_inference_steps: 30,
}

// FLUX schnell (after)
parameters: {
  num_inference_steps: 4,
  width: 1024,
  height: 1024,
}

FLUX doesn’t use classifier-free guidance the same way SDXL does, so guidance_scale is dropped. Output resolution is bumped to 1024×1024.

Theme-Aware Prompts

The bigger improvement is context. Each puzzle now knows which theme it belongs to, and the prompt reflects that.

On the server, we added a THEME_STYLES map — one style string per theme describing the world the image should live in:

const THEME_STYLES: Record<string, string> = {
  frozen: "icy winter wonderland setting, sparkling snow, frozen mountains, glowing blue aurora sky, magical ice crystals",
  "paw-patrol": "sunny Adventure Bay coastal town setting, colorful rescue vehicles, bright cheerful sky, heroic rescue theme",
  cocomelon: "bright sunny nursery school setting, rainbows, colorful toys, cheerful garden, warm playful atmosphere",
  cars: "desert race track setting, red rock formations, dramatic sunset sky, speedway crowd and flags",
  bluey: "warm sunny Australian suburb setting, eucalyptus trees, clear blue sky, green backyard and hills",
  // ... all 10 themes
};

The client now passes themeId alongside subject in the generate request:

// PuzzleBoard.tsx
api.post<{ image: string }>(API_ROUTES.IMAGE_GENERATE, {
  subject: animal.aiPrompt,
  themeId: station.id,   // new
})

And the server builds the final prompt by combining subject + theme style + base quality hints:

const BASE_STYLE = "vibrant colors, highly detailed, filling the frame, centered composition, children's book illustration style, cute and friendly";

const prompt = themeStyle
  ? `${subject}, ${themeStyle}, ${BASE_STYLE}`
  : `${subject}, ${BASE_STYLE}`;

So a helicopter puzzle in the Paw Patrol theme now gets:

“a colorful rescue helicopter flying over a mountain with a puppy looking out the window, sunny Adventure Bay coastal town setting, colorful rescue vehicles, bright cheerful sky, heroic rescue theme, vibrant colors, highly detailed, filling the frame…”

A snowflake in the Frozen theme gets the aurora sky and ice crystal treatment. A Krabby Patty in SpongeBob gets coral reef and ocean light rays.

No More Duplicate Images

The original code had an in-memory cache that returned the same image for the same subject string. This made sense as a cost control measure but meant kids saw the same image every time they replayed a puzzle.

We removed the cache entirely. Each puzzle start generates a fresh image. FLUX’s non-determinism handles the rest — identical prompts produce meaningfully different compositions each time.

The “Play Again” button was also reusing the previous image. That’s fixed — it now triggers a full new generation:

onPlayAgain={() => {
  setCompleted(false);
  setGhostImage("");
  setLoading(true);
  // re-trigger AI generation...
}}

The Result

The difference is immediate. Images now fill the frame. The subject is large, centered, and detailed. Colors are saturated. And the theme context shows — Frozen puzzles feel icy and magical, Paw Patrol puzzles look like they’re set in Adventure Bay.

The generation script for carousel/background art was updated to use FLUX too. Run bun run scripts/generate-theme-images.ts --force to regenerate all theme art with the new model.

What’s Next

We’re still generating images on-demand per user play. The next step is pre-generating a pool of 3–5 images per puzzle and rotating through them — faster load times and still fresh on every play.

The rate limit (one generation per 24 hours per user) remains in place for production to control Hugging Face API costs. With FLUX schnell’s 4-step inference, each generation is faster and cheaper than before.