Extracting AI Into a Microservice: Building an AI Gateway for Kids Games
The puzzle app was doing too much. Somewhere between routing requests, managing user sessions, and generating AI images, the backend had become a monolith with an identity crisis. AI calls lived alongside auth middleware. Hugging Face credentials sat in the same secrets bundle as MongoDB URIs. Adding a second game would mean copy-pasting the entire AI stack.
So we extracted it.
The Problem With Coupling AI to the App
The puzzle app’s original AI pipeline was direct: the server called Hugging Face, got an image back, and returned it to the client. Straightforward, and fine for one app.
But we’re building a platform, not a product. There will be more games. Eventually TTS for reading out puzzle pieces, video generation for animated themes, maybe LLM-based hint systems. Each of those needs:
- An API key managed somewhere
- Timeout and retry logic
- Observability (which model? how long? did it fail?)
- A provider abstraction so you can swap from Hugging Face to Replicate without touching app code
Duplicating that per-app is exactly the kind of thing that creates maintenance debt. The right move is to solve it once.
What We Built
ai-service — a standalone Bun/TypeScript microservice that acts as an AI gateway for the entire Kids Games platform.
Web App (puzzle.kidsgamesapp.com)
└─► AI Service (internal, :3002)
└─► Hugging Face API (FLUX.1-schnell)
Any current or future Kids Games app calls the same service. New AI capability? Add it once. New provider? Swap it with an env var. New consumer? Pass a different x-consumer-id header.
The Design
Provider Abstraction
Each AI capability has its own provider interface. Image generation looks like this:
interface ImageProvider {
name: string;
generate(request: ImageGenerateRequest): Promise<ImageResult>;
transform(request: ImageTransformRequest): Promise<ImageResult>;
healthCheck(): Promise<boolean>;
}
A registry maps provider names to implementations. The active provider is selected via env var:
IMAGE_PROVIDER=huggingface
No runtime switching. Change the env var, redeploy. Adding a new provider (say, Replicate) means implementing the interface and registering it — no route changes, no API contract changes.
Separate generate and transform
The old code had a quirk: even image generation used the transform() method with an empty image string. provider.transform("", prompt) was how you generated a new image from a prompt.
The new service fixes this. generate() takes a text prompt and returns an image. transform() takes an existing image (base64) and a style prompt. They’re distinct operations with distinct request shapes. Cleaner API, cleaner implementation.
Authentication
The service is internal-only (ClusterIP, no ingress), but it still validates callers:
x-api-key: <AI_SERVICE_API_KEY>
x-consumer-id: puzzle-web (optional — for metrics/tracing)
x-request-id: <uuid> (optional — for cross-service trace correlation)
The x-request-id is interesting: if the puzzle app generates a request ID and passes it through, both services log with that ID. You can link a Langfuse trace in the puzzle app to the corresponding trace in the AI service. Distributed tracing without a distributed tracing platform.
Observability
Every capability route records:
ai_request_total{capability, provider, status}
ai_request_duration_seconds{capability, provider}
ai_provider_health{capability, provider}
Prometheus scrapes /api/metrics via a ServiceMonitor. The health check at /api/health returns provider status per capability — it’s what Kubernetes uses for readiness/liveness probes.
Langfuse traces every provider call: input prompt, model name, provider, duration, success/failure. Same Langfuse instance as the puzzle app.
The Migration
The puzzle app now calls ai-service through a thin HTTP client:
// server/src/services/aiClient.ts
export async function generateImage(prompt: string): Promise<AiImageResult> {
return callAiService("/api/v1/image/generate", { prompt });
}
export async function transformImage(imageBase64: string, prompt?: string): Promise<AiImageResult> {
return callAiService("/api/v1/image/transform", { image: imageBase64, prompt });
}
The routes (generateIllustration.ts, imageTransform.ts) kept all their business logic — in-memory cache, rate limiting, prompt construction — and just changed the final call. The web client and React Native app didn’t change at all. Their API contract with the puzzle app is unchanged.
One Subtle Bug
Reading env vars at module top-level doesn’t work the way you’d expect in Bun:
// This doesn't work — reads before .env is loaded
const AI_SERVICE_URL = process.env.AI_SERVICE_URL;
export async function generateImage(prompt: string) {
if (!AI_SERVICE_URL) throw new Error("...");
}
Bun hoists static imports before running any top-level code. So by the time index.ts runs its custom .env loader, aiClient.ts has already captured undefined into that constant.
The fix is lazy reads — inside the function, not at module level:
async function callAiService<T>(path: string, body: unknown): Promise<T> {
const AI_SERVICE_URL = process.env.AI_SERVICE_URL; // read here, not at top
const AI_SERVICE_API_KEY = process.env.AI_SERVICE_API_KEY;
...
}
Worth knowing if you’re using Bun workspaces with a custom env loader.
Infrastructure
The service deploys to the same EKS cluster as the puzzle app. It gets its own:
- Helm release (reusing the existing web-app chart)
- ClusterIP service (port 3002, no ingress — internal only)
- ExternalSecret pulling
AI_SERVICE_API_KEYandHF_API_KEYfrom AWS Secrets Manager - ServiceMonitor for Prometheus scraping
- NetworkPolicy rules allowing the puzzle server and Prometheus to reach it
Resource limits are set conservatively: 100m/500m CPU, 128Mi/512Mi memory. The service is mostly I/O — it proxies to external APIs, does no heavy compute. Memory is sized to handle base64 image payloads (10MB+ per concurrent request) without OOMing.
The CI/CD pipeline follows the same pattern as the puzzle app: push to main → GitHub Actions builds a Docker image → pushes to ECR → updates the image tag in the infra repo → ArgoCD auto-deploys.
Local Development
For local development, the service runs identically to production:
cd /Users/itay/dev/ai-service
cp .env.example .env # fill in HF_API_KEY and AI_SERVICE_API_KEY
bun run dev # runs on localhost:3002
The puzzle app’s .env gets two new vars:
AI_SERVICE_URL=http://localhost:3002
AI_SERVICE_API_KEY=local-dev-key
There’s also a CLI utility for one-off scripting:
bun run scripts/generate.ts image "Cute cartoon lion playing puzzle, marketing banner"
Good for generating marketing assets, testing prompts, or just poking the API.
What’s Next
The service has stub routes for TTS (/api/v1/tts/synthesize) and video generation (/api/v1/video/generate) — both return 501 for now. When we get to those capabilities, the provider interfaces are already defined and the routing skeleton is in place.
The bigger picture: as more Kids Games apps come online, they call the same service with different x-consumer-id values. The Grafana dashboard shows per-consumer usage. Langfuse traces are grouped by consumer. One API, full visibility.
One session. Fully deployed. The puzzle app is cleaner, the AI stack is reusable, and the platform is one step closer to actually being a platform.