Building an SEO Content Pipeline: AI Stories and Coloring Pages at Scale


We had 91 AI-generated puzzle images and zero organic traffic. Every image had a theme, a subject, and a visual story behind it — but none of that was visible to Google. The images lived in MongoDB, served to the game. They didn’t exist anywhere a search crawler could find them.

The goal: turn each puzzle image into a public SEO page. A story page and a coloring page, in 5 languages, published automatically whenever a new image is generated.

What We Built

Scheduler (every 1 min)
  → POST /api/admin/generate-seo (puzzle server)
    → pick 1 unprocessed image ($sample)
    → call OpenAI gpt-4o-mini for story + SEO fields
    → save to MongoDB (5 locales)
    → trigger GitHub repository_dispatch
      → marketing site rebuilds
        → static pages at /puzzle/stories/[slug]
        → static pages at /puzzle/coloring/[slug]

One cron tick, one image. No batching. We’ll get to why.

The Content Schema

Each PuzzlePool document gains a translations map and an seo map, keyed by locale:

translations: {
  en: {
    title: string;
    story: string;        // 10+ paragraphs, 350+ words
    seo: {
      blurb: string;      // 2-3 sentence description
      learningNote: string;
      conversationStarter: string;
      funFact: string;
    }
  },
  es: { ... },
  fr: { ... },
  de: { ... },
  pt: { ... },
}
seoGeneratedAt: Date;
slug: string;             // "frozen-polar-bear-slide"

We also added a subject field during migration — a one-line description of what the AI actually drew. This is what the story is written about, not the theme name. “A polar bear cub sliding down an icy hill” gives the LLM something concrete to write from.

The Generator

The SEO generator calls the ai-service’s /api/v1/text/generate endpoint once — a single prompt that asks for all 5 locales in one shot:

const prompt = `
You are a creative writer for a children's educational app (ages 2-7).

Image subject: "${subject}" (from the ${themeName} world)

Write a LONG, engaging children's story about this image.
MINIMUM 10 paragraphs. At least 350 words total.
Full narrative arc: beginning, middle, and end.

Return JSON with keys: en, es, fr, de, pt
Each locale: { title, story, blurb, learningNote, conversationStarter, funFact }
`;

The story prompt is intentionally prescriptive about length. Our first version asked for “a short story” and got 3 paragraphs. Kids read these while waiting for a puzzle to load — we needed something that could actually hold attention.

Why BATCH_SIZE=1

We started with BATCH_SIZE=5. Five images, five sets of locale calls, one HTTP request. The scheduler timed out at 118 seconds. Then at 234 seconds. The puzzle server’s idle timeout was killing connections mid-generation.

The issue: OpenAI can take 30-250 seconds for a long multilingual story. Five in a row is a guaranteed timeout. Bun’s idleTimeout has a maximum of 255 seconds — you can’t set it higher without crashing on startup. One image per tick fits inside that window. Five does not.

The tradeoff: 91 images at one per minute takes 91 minutes to process. That’s fine for a background job. You’re not shipping a product — you’re filling an index.

Switching from Groq to OpenAI

We originally used Groq (llama-3.3-70b) for text generation because it’s near-instant. It worked great until we hit 100K tokens per day, Groq’s free tier limit. At that point, every job started failing silently — the LLM call returned an error, but the job still returned 200 because we were only checking processed > 0 at the end.

We fixed two things:

1. Return 503 on any failure. If the LLM call fails, the job fails. The scheduler records it as failed, shows the error in the admin UI, and retries up to 3 times. No more phantom successes.

2. Switch to OpenAI gpt-4o-mini. No rate limits at this scale. Cost: ~$0.003 per image (5 locales × long story). For 91 images that’s $0.27 total. For ongoing generation it’s a rounding error.

The GitHub Dispatch Trigger

Once an image is processed, we trigger a rebuild of the marketing site:

await fetch(`https://api.github.com/repos/${MARKETING_REPO}/dispatches`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${GITHUB_DISPATCH_TOKEN}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ event_type: "seo-content-updated" }),
});

Three things bit us here:

Token 403. The GITHUB_DISPATCH_TOKEN was in AWS Secrets Manager but the ExternalSecret manifest had never been re-applied after we added the new key. The pod was running with the old secret version. kubectl apply -f external-secret.yaml fixed it.

“Nothing to commit” builds. GitHub Actions was using github.sha as the Docker image tag. Dispatch events inherit the sha of whatever commit triggered the dispatch registration — so every rebuild got the same sha and Docker skipped the push as “nothing changed.” Fix: tag with ${{ github.sha }}-${{ github.run_id }} so every build produces a unique tag.

Rebuild frequency. With 91 images processing over 91 minutes, we were triggering a rebuild every minute. Each rebuild is a new ArgoCD sync, a new pod deployment, a new rolling update. Rolling updates need a spare pod slot.

The Node Pressure Incident

At 17 pods per node (the default for t3.medium with standard ENI addressing), a rolling update needs 18 slots — one extra. We had 17/17 filled. New pods went Pending. The marketing site degraded.

The root cause: we’d enabled ENABLE_PREFIX_DELEGATION=true on the VPC CNI, which should raise the pod limit to 110 — but kubelet bootstraps before CNI runs and uses the old ENI formula. The CNI setting was there; kubelet didn’t know about it.

Fix: a launch template with an AL2023 nodeadm config that explicitly sets maxPods: 110:

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  kubelet:
    config:
      maxPods: 110

AWS doesn’t let you add a launch template to an existing node group. We created a new node group (workloads-v2), drained the old nodes, terminated them, and cleaned up the Terraform state. Zero downtime — the scheduler kept running through the whole migration.

The Marketing Site

The Astro marketing site builds story and coloring pages statically from the API:

// src/pages/puzzle/stories/[slug].astro
export async function getStaticPaths() {
  const stories = await fetchStories(); // GET /api/public/stories
  return stories.map(story => ({
    params: { slug: story.slug },
    props: { story },
  }));
}

At build time, Astro fetches all 91 stories, generates one page per slug, and outputs static HTML. No server, no runtime API calls. A Cloudflare CDN edge serves each page in under 50ms.

The coloring pages work the same way, but render the puzzle image through a CSS filter stack:

.coloring-image {
  filter: grayscale(1) invert(1) blur(0px);
  mix-blend-mode: color-dodge;
}

Combined with an SVG gamma filter, this turns any photo into something that looks like a coloring page outline. When CORS headers are available on the image (they are — we set crossOrigin="anonymous" on all S3 images), we also run a Sobel edge detection algorithm in canvas for sharper lines. The CSS filter alone works everywhere as a fallback.

No separate coloring images are generated or stored. It’s entirely client-side.

The Numbers

  • 91 images processed → 91 story pages + 91 coloring pages
  • 5 locales per page → 455 story pages + 455 coloring pages indexed across EN, ES, FR, DE, PT
  • ~$0.003 per image for OpenAI gpt-4o-mini
  • Total generation cost for all 91 images: ~$0.27
  • Build time: ~45 seconds per Astro rebuild
  • Time from “new image in DB” to “live page”: ~3 minutes (generation + dispatch + build + deploy)

What We’d Do Differently

Start with batch size 1. We wasted time debugging timeouts that were predictable in hindsight. Long LLM calls and batched processing don’t mix.

Test the dispatch token before deploying. We spent 20 minutes thinking the rebuild wasn’t triggering when it was just a 403 on the token endpoint.

Keep the subject field from the start. We added it late, which meant migrating 91 existing documents to extract subjects from their prompt names. Not hard, but avoidable with a bit of upfront schema planning.

What’s Next

The index pages have category filter pills (Magic Dollhouse, Frozen Kingdom, etc.). We’ll add theme-specific landing pages — one per puzzle world — that aggregate all its stories and coloring pages. Those will be the real SEO targets: “Frozen coloring pages for kids” and “Magic Dollhouse bedtime stories.”

The pipeline runs every minute. As new puzzle images are generated by the game, they’ll automatically get SEO pages within 3 minutes. No manual steps.