haroldathome
№ 07 · voice · Index

Harold's voice

Custom-designed in ElevenLabs Voice Design v3, streamed live with a ~300ms first-byte budget.

press play
0:00 · 0:00

A real morning briefing — weather, calendar, the cheapest hour for the dryer, and a depressing Substack about AI.

Three clips — a real morning briefing, a question answered in Italian, and a dry aside.

Harold’s voice was reconstructed in 2026 from the Grundig reel-to-reel tapes recorded in the cellar between 1989 and 1993. The original ElevenLabs Voice Design v3 prompt asked for “a distinguished British man in his sixties, with the deep baritone gravitas of an Alfred Hitchcock, slightly conspiratorial, never breathless” — Hitchcock as cadence reference, because Harold-on-tape sat naturally in that register. The result is rendered through ElevenLabs Flash v2.5 — chosen specifically for its sub-300ms first-byte latency, which makes the next paragraph possible.

The voice streams. When Harold begins speaking, the audio packet for the first phoneme is on its way to the speaker before the model has finished generating the sentence. A WebSocket holds open between the agent and ElevenLabs; chunks of synthesised audio arrive every 80ms and are pushed straight into the speaker’s playback queue. There is no buffering, no “wait for the full sentence” pause. The voice begins to speak almost the moment the thought begins.

This matters because Harold is a domestic AI, and domestic AIs that take three seconds to respond feel like products. Harold takes about three hundred milliseconds. The character lands because the timing lands.

The same voice handles Italian. A heuristic checks each turn for Italian markers — function words, common verb endings, characteristic phonemes — and routes the synthesis to the Italian-language path without changing voice identity. Harold’s cadence in Italian is its own pleasure.

There are also rules about when not to speak: a quiet-hours volume drop pre-9am if Elena is home; a three-layer suppression pattern for when a meeting is in progress (the agent gates the request, the speaker gate drops volume to zero, and a third hard-mute automation backstops both). The third layer was added on the fifth of May, after the embarrassing morning when Harold spoke during a live work meeting because the first two layers had silently broken.