Building a Kids Game Sound System with Web Audio API (No Libraries)


Sound is the last thing you add and the first thing kids notice. A tile snapping into place with no audio feels broken. A celebration with no fanfare falls flat. We had a fully working puzzle app and it was completely silent.

Time to fix that.

Why Not Just Use Howler.js

The obvious move is to reach for Howler.js or Tone.js. Both are solid. But we had a specific constraint: we needed exactly six types of sound (tile drag, tile snap, tile wrong, celebration, carousel swipe, level-up), plus background music and voice encouragement. For six sound effects, pulling in a 40KB library felt like overkill.

Web Audio API is built into every browser. It’s lower-level, but for what we needed it was actually simpler — you describe sounds as signal graphs and the browser synthesizes them. No files to load, no network requests, no bundle cost.

The decision: Web Audio API directly for SFX, files for music and voice (because synthesis can’t replicate those convincingly).

The AudioContext Problem

The first thing you learn about Web Audio API is that you can’t just create an AudioContext at module load time:

// This will be silently suspended in every browser
const ctx = new AudioContext();

Browsers block audio until a user gesture. An AudioContext created before any interaction starts in suspended state and your sounds don’t play. The fix is to lazy-initialize:

let _ctx: AudioContext | null = null;

function getCtx(): AudioContext {
  if (!_ctx) _ctx = new AudioContext();
  if (_ctx.state === 'suspended') _ctx.resume();
  return _ctx;
}

Every sound function calls getCtx() first. The first user tap — anywhere in the app — creates the context. After that, it’s reused for everything.

Synthesized SFX

Tile sounds are pure oscillator synthesis. No files, no loading states, instant playback:

export function playTileSnap() {
  const ctx = getCtx();
  const osc = ctx.createOscillator();
  const gain = ctx.createGain();

  osc.connect(gain);
  gain.connect(ctx.destination);

  osc.frequency.setValueAtTime(440, ctx.currentTime);
  osc.frequency.exponentialRampToValueAtTime(880, ctx.currentTime + 0.05);
  gain.gain.setValueAtTime(0.3, ctx.currentTime);
  gain.gain.exponentialRampToValueAtTime(0.001, ctx.currentTime + 0.15);

  osc.start(ctx.currentTime);
  osc.stop(ctx.currentTime + 0.15);
}

The snap is a short pitch-up chirp. Wrong placement is a lower buzz with a downward pitch sweep. Drag is a soft noise burst. Celebration chains three ascending tones. Each is ~10 lines of code and plays in under 1ms.

The key insight: for UI feedback sounds, synthesis is actually better than files. Files have to load (even cached), synthesis is instant. Kids tap fast — you want zero latency on the snap.

Music with Fade Transitions

Background music is .mp3 files loaded via fetch + decodeAudioData. We have 6 tracks that rotate randomly per screen, avoiding repeats:

function pickRandomIndex(length: number, lastIndex: number): number {
  if (length === 1) return 0;
  let next;
  do { next = Math.floor(Math.random() * length); } while (next === lastIndex);
  return next;
}

The tricky part was transitions. When a kid moves from the theme carousel to the puzzle board, we don’t want a hard cut. Abrupt music stops sound jarring, especially for young kids who are sensitive to audio changes.

The solution: each screen gets its own music loop. On mount, the puzzle board fades in its music. On unmount, it fades out. Since carousel and puzzle use different track pools, you get a natural crossfade effect when navigating between them:

export function setMusicEnabled(enabled: boolean) {
  if (!musicGain) return;
  const ctx = getCtx();
  musicGain.gain.cancelScheduledValues(ctx.currentTime);
  if (enabled) {
    musicGain.gain.linearRampToValueAtTime(MUSIC_VOLUME, ctx.currentTime + 0.3);
  } else {
    musicGain.gain.linearRampToValueAtTime(0, ctx.currentTime + 0.3);
  }
}

The 300ms ramp is barely perceptible but makes the difference between “music stopped” and “music faded out.”

ElevenLabs Voice Encouragement

Tile snaps are easy to synthesize. “Amazing job, superstar!” is not.

We generated 12 voice encouragement phrases using ElevenLabs — a cheerful, kid-friendly voice saying things like “You did it!”, “Incredible!”, “You’re a puzzle genius!”. A small script hits the ElevenLabs API for each phrase and saves the MP3s to client/public/sounds/voice/.

// scripts/generate-voice-assets.ts
const phrases = [
  "Amazing job!", "You did it!", "Superstar!",
  "Incredible work!", "You're a puzzle genius!", ...
];

for (const phrase of phrases) {
  const audio = await elevenlabs.generate({ text: phrase, voice_id: VOICE_ID });
  await Bun.write(`client/public/sounds/voice/${slug(phrase)}.mp3`, audio);
}

The script runs once (or when you want to add phrases). The assets ship as static files. Total cost: a few cents for 12 clips.

On puzzle completion, we pick a random phrase and play it after the celebration SFX. Same pickRandomIndex logic to avoid repeating the same phrase twice in a row.

Wiring It Up with React Context

Components don’t import soundEngine directly. Everything goes through SoundContext:

// context/SoundContext.tsx
export function SoundProvider({ children }: { children: React.ReactNode }) {
  const [sfxEnabled, setSfxEnabled] = useState(
    () => localStorage.getItem('sfx') !== 'false'
  );
  const [musicEnabled, setMusicEnabled] = useState(
    () => localStorage.getItem('music') !== 'false'
  );

  // sync to engine and localStorage on change
  useEffect(() => {
    soundEngine.setSfxEnabled(sfxEnabled);
    localStorage.setItem('sfx', String(sfxEnabled));
  }, [sfxEnabled]);

  // ...expose play functions, toggle functions
}

SoundProvider wraps the app at the root. Any component calls useSoundSystem() and gets the play functions along with the current enabled state. Settings persist to localStorage and survive page refreshes.

The Profile page has toggles for both SFX and music. Flipping the toggle fires immediately — if you turn off music mid-puzzle, it fades out within 300ms.

What It Sounds Like In Practice

The full sound map:

EventSound
Theme carousel swipeSoft whoosh (synthesized)
Theme card tapShort click (synthesized)
Tile dragSubtle noise burst (synthesized)
Tile snap (correct)Pitch-up chirp (synthesized)
Tile wrong placementLow buzz (synthesized)
Puzzle startMusic fades in
Puzzle completeAscending fanfare + voice phrase
Level upThree-note ascending chord
Screen changeMusic crossfade

The whole thing is about 250 lines of TypeScript. No audio library. No audio files for SFX. No third-party dependencies beyond the ElevenLabs script (which runs offline and produces static assets).

Web Audio API is verbose but it’s exactly as powerful as you need it to be.