Audio grows up — synchronized read-along, gapless playback, and a search that understands what you mean

Thimo Buchheister · June 19, 2026

audiovorlesenmitlesenbarrierefreiheitsucheazure-ai-searchbenachrichtigungenqualitätalpha

Fiction

Audio grows up — synchronized read-along, gapless playback, and a search that understands what you mean

Black headphones against a warm background — every story is now also available to listen to, and properly so Photo: Unsplash

The weeks before this revolved around the foundation — the invisible things that simply have to be right: maintainable legal texts, one consistent voice in every email, a safety net that runs through the live platform like a human. That was the work below the waterline. The last few days revolved again around something you experience directly — and across two senses at once: listening and finding.

That OutaStory can read stories aloud has been in place for a while. That the voices sound human was the topic of its own post — the ElevenLabs grant, which we wrote about at length elsewhere. This is about the other half of the same coin: not how the voice sounds, but what it feels like to listen to a story. Because a good voice alone doesn't make an audiobook. You need a player that doesn't stutter, that you can operate from the lock screen, that knows where in the chapter you currently are — and ideally, text you can read along with while listening. That's exactly where the focus was: turning narration into a listening experience, making finished audio versions instantly available, rebuilding search so it understands what's meant — and underneath all of it, a small, honest lesson about care.

From narration to a real listening experience

The first and biggest thread was a complete overhaul of how audio is generated and how it's played back — on every platform, from the browser through Android and iOS to the Windows app.

Until now, an audio version was generated as one big block per chapter and played back in one piece. That works, but it has two weaknesses. First, generation is fragile: if something hiccups halfway through, the entire chapter starts over from scratch — expensive and slow. Second, a single big block is hard to navigate: scrubbing is coarse, and a read-along that precisely knows where in the text the voice currently is was barely feasible with it.

That's why an audio version is now generated in segments. Each chapter is broken into manageable pieces that are narrated one after another — with overlapping context so the narrative style stays consistent across the cuts. Each finished piece is saved immediately. If generation is interrupted, it resumes exactly where it left off, without regenerating a single already-finished piece. If narration hits a passage the provider blocks for safety reasons, the pipeline automatically tries to gently rephrase just that spot instead of letting the whole chapter fail. What used to be a fragile, single-block process has become a robust, resumable stream of small, verified pieces.

On the playback side, this same segment structure pays off twice over. The individual pieces are stitched together into gapless playback — you don't hear a seam between them, the chapter plays through as if it were a single take. At the same time, the player always knows which segment it's currently in. That produces a player that finally feels like a real audio player:

Skip forward and back by chapter, right in the mobile player and on the lock screen — finish listening to a story without bringing the app to the foreground.
Scrub to any point, precisely instead of coarsely.
Control from the lock screen and via the system's media keys — on Android and iOS, playback runs as a full-fledged background media session, with title, chapter, and the familiar controls right where the operating system expects them.

All of this lives inside a purpose-built, cross-platform audio engine — the same core delivering the same reliable playback everywhere, instead of four different stopgap solutions per platform.

Reading along while the voice reads

An open book with bright pages — text and voice now run in sync, sentence by sentence Photo: Unsplash

When the player knows exactly where in the text the voice currently is, a feature emerges that's especially close to my heart: synchronized read-along.

In the read-along view, the full chapter text scrolls by, and exactly the sentence currently being read aloud lights up and gently scrolls into the middle. Anyone listening can follow along with their eyes at any time; anyone who wants to reread a spot can immediately see where it is. For readers who find it hard to concentrate on pure listening, for language learners who want to connect sound and script, and for anyone who simply likes doing both at once — hearing and seeing — this is a noticeable difference. It's the kind of feature you use once and then miss when it's gone.

Technically it's more demanding than it looks: the text is broken into sentences, each sentence gets an exact position within the chapter, and playback continuously reports back where it is — fine-grained enough that the highlight follows the sound without jittering. This is exactly where the new segment structure pays off: without it, this precise following would be hard to pull off cleanly.

Synchronized read-along is a premium feature — but one that shows up exactly where it makes sense: anyone who opens an audio version sees the read-along view and a friendly note about what it does and how to unlock it — not a hidden feature, but a clear offer at the right moment. Regular listening stays free for everyone; the synchronized read-along is the premium add-on.

Audiobooks that are there instantly

A freshly generated audio version costs compute time and — with the high-quality voices — real money per character. For the flagship stories we curate ourselves, it would be wasteful to regenerate every audio version every time the platform is built. That's why there's now a way to ship finished narrations along with the build: an audio version generated once is saved and simply carried over as a finished audiobook the next time the platform is built — no regeneration, no waiting, no additional consumption of narration credits.

The result for listeners is refreshingly simple: for curated stories, the audiobook is instantly there. No "being generated, please check back later" — just press play and listen. And thanks to the segment format described above, these pre-generated audio versions come with everything gapless playback and synchronized read-along need.

Search that understands what you mean

Photo: Unsplash

The second big thread has nothing to do with audio, and yet aims at the same goal: making the platform feel right. It's about search — and it was rebuilt from the ground up.

A library of thousands of public-domain classics and flagship stories is only as good as your ability to find what you're looking for in it. The previous search was simple: it compared the entered text more or less directly against titles and names. That works for "Goethe", but fails at everything a person actually types: a swapped word, a typo, a first name instead of a last name, a different inflection. Anyone typing "Venice death" instead of "Death in Venice" used to find nothing — even though the story is sitting right there.

That's why search was rebuilt in several stages, cleanly and step by step:

First, a clean separation. The search logic was placed behind a clear interface so the search provider can be swapped out without the rest of the platform noticing anything. That sounds like inner architecture — but it's the prerequisite that let the following stages be introduced risk-free.

Then a real German full text. As a reliable, universally available baseline, a genuine full-text search with a German language analyzer was added: it understands word stems and inflections, is insensitive to word order, and finds authors via bio text and role too. "Venice death" now finds "Death in Venice". This stage needs no external service and works everywhere — including locally in development — as a reliable fallback.

And finally, Azure AI Search. For production, Azure AI Search is now connected — a dedicated search engine that forgives typos, ranks by relevance, and scales with the growing library. The search index is deliberately just a derived, rebuildable-at-any-time mirror: the authoritative data stays in our database, and the index only holds author names, bios, titles, and descriptions as a searchable copy. When an account is deleted, the associated profile also disappears from the index. This privacy classification isn't just something buried in the engine room — it's stated openly in the privacy notice, where Azure AI Search is named as its own sub-processor, with exactly this explanation of what it does and doesn't receive.

The choice of which stage applies happens automatically: if the dedicated search engine is available, it takes over; otherwise the platform falls back seamlessly to German full text. For the person searching, that simply means: they type something approximate and get the right result — without needing to know what's working in the background.

A notification when the voice is ready

Generating audio takes time — several minutes for a long story with a high-quality voice. Until now, you had to keep an eye on the page to know when the audio version was finished. Not anymore: as soon as the narration of your own story is complete, the author gets a notification — in the app, as a push to the device, and as an email, the latter with a direct link to the story where the audio version plays instantly.

The notification fits right into the notification system the previous weeks brought: it's switchable off — anyone who doesn't want it can disable it per channel in settings —, and it goes exclusively to the right person, the author of the respective story. It's a small feature, but one that simply makes an annoying "keep checking back" moment disappear.

A lesson in honesty: a bug that quadrupled every duration

An open book in warm light — sometimes the most important discovery isn't in a feature, but in a number that was wrong Photo: Unsplash

This phase also includes a small story I'm deliberately telling here because it shows how this kind of work really unfolds — and because it contains a nice lesson in care.

While narrating, the platform measures, for every audio piece, how long it is. This duration isn't just cosmetic: it feeds the progress indicator, the scrubbing — and exactly the synchronized highlighting during read-along described above. If the duration is wrong, the marker falls out of step.

While building the pre-generated audio versions, it turned out that the measured durations were far too long — a good two-minute piece was logged as over fourteen minutes. The factor was suspiciously clean: exactly quadruple. And that cleanliness was the crucial clue. The cause lay in a tiny, well-hidden assumption: the audio library used to measure duration returns a length that's not calculated in tones per second, but in the bytes of the decoded audio stream — for the mono recordings used here, four bytes per tone. The calculation simply divided by the wrong size and, as a result, was off by exactly this factor of four.

The bug itself was fixed with a single line. But the interesting part isn't the fix, it's what came afterward: so that this bug can never come back, the duration calculation has now moved into its own, isolated, tested building block — with tests that pin down the exact real numbers that exposed the bug, and a test that explicitly labels the old, wrong formula as "quadruple". If someone accidentally falls back to the old calculation a year from now, the test will loudly fail before anything ships. On top of that there's a second test that runs the real audio library against a real audio file — if a future update to that library ever changes the meaning of that length, this will also be caught immediately.

That's the unglamorous truth behind "quality": not that no bugs happen — but that a bug, once found, gets locked down so it can't happen a second time. That's exactly what the rest of the test layer that kept growing during this phase is for too: around the new audio pipeline, around publishing, covers, and the read-along.

Hundreds of small edges smoothed

Alongside the big threads ran what runs in every phase — the patient smoothing of small edges, never a headline on its own, but in aggregate the difference between "works" and "feels good":

The legal pages got a visual upgrade to match: the same maintained layout as the rest of the app, with their own illustrative images and cleanly set text blocks — plus a new privacy notice for narrators, matching the growing voice marketplace.
Navigation got tidied up: the subscription moved to where you'd expect it — into the profile menu — and a duplicate entry was removed.
A minor-protection gap was closed: on an author's profile, stories with a higher age rating used to show up even for logged-out visitors — now the same age filtering applies there as everywhere else.
Various stability and cleanup work around the local development environment and error monitoring, so real problems don't get lost in the noise.

Numbers, for the record

Pull requests: around three dozen since the last regular blog post — focused on the audio pipeline and search.
Main topics: segmented, resumable audio generation; gapless playback with chapter skip, scrubbing, and lock-screen controls across every platform; synchronized read-along as a premium feature; pre-generated, instantly playable audio versions; a ground-up rebuilt search from German full text to Azure AI Search; notification when narration is finished; plus a lot of quiet quality and test work.
Catalog: unchanged and broad — the thousands of public-domain classics are live, now noticeably more searchable.
Tests: still green in the high three-thousands, specifically expanded around the new audio-duration calculation (against the bug described above), publishing, covers, and read-along — across unit tests, browser tests, and the end-to-end net against the running platform.

What's next

Try read-along more broadly — keep fine-tuning the synchronized highlighting across long chapters, different devices, and voices.
Sharpen search — now that Azure AI Search is live, work on relevance nuances and suggestions.
Expand pre-generated audio versions — more of the curated flagship stories with instantly available audiobooks.
Turn on minor protection — the age engine still waits for our data protection officer to approve the verification method.
App.Shared Razor Pages tests — the last red coverage entry, still ongoing in weekly waves.

This phase had two faces: the audible one — an app you can now really listen to and read along with — and the quiet one underneath, which didn't just fix a quadrupled bug but locked it away for good. Both belong together. A nice feature is only worth as much as the trust that it'll still be right tomorrow. If anything catches your attention while listening, reading along, or searching that doesn't feel right — a highlight that falls out of step, a search that doesn't find the obvious, an audiobook that stutters —, let me know. Exactly these are the spots I want to find while the net still holds.

Audio grows up — synchronized read-along, gapless playback, and a search that understands what you mean

Audio grows up — synchronized read-along, gapless playback, and a search that understands what you mean

From narration to a real listening experience

Reading along while the voice reads

Audiobooks that are there instantly

Search that understands what you mean

A notification when the voice is ready

A lesson in honesty: a bug that quadrupled every duration

Hundreds of small edges smoothed

Numbers, for the record

What's next

Comments (0)

Leave a comment

Estimated Read Time

Top Commented Posts

Most Read Posts