Inside the AI polish layer | Hissper

Transcription is the easy part. Here's how Hissper turns raw speech into the writing you actually meant.

Category: Engineering · Author: Daniel Cho · April 11, 2026 · 9 min read

Most dictation tools stop at transcription. You speak, they print. The output reads like a court reporter's transcript — every "um", every false start, every meandering aside. It's literally what you said, which is almost never what you meant.

The polish layer is what closes that gap. It's the part of Hissper that takes a faithful transcript and rewrites it into the email you would have sent if you'd had a careful editor sitting next to you.

Three passes

Polish runs in three passes. First, filler removal — a small classifier strips out the ums, ahs, and trailing connectives. Second, structural rewrite — a larger model reorders fragments, joins half-sentences, and turns parenthetical asides into proper clauses. Third, tone matching — a thin LoRA fine-tuned on the user's own writing nudges word choice and rhythm to sound like them.

"The goal isn't to make you sound polished. It's to make you sound like you on your best day."

Latency budget

All three passes have to fit inside a 350ms latency budget. That's the threshold above which streaming text feels laggy. We hit it by running the filler classifier on-device, batching the structural rewrite, and overlapping the tone pass with cursor commit.

It's a lot of moving parts for what feels, to the user, like nothing at all happening. Which is exactly the point.

Back to home