Wrote something on voice AI UX and I’m honestly not sure the arguments hold up.
The piece is about why voice AI feels broken even when the model is fine. My hunch is mostly UX: perceived latency, interruption handling, confirmation loops, channel handoffs that drop all context. I’m working from observation and a bit of initial production experience, and I’d rather find out I’m wrong from people actually building this than just publish and move on.
The backchannel cues argument feels theoretically solid but I haven’t seen it work cleanly at scale. And I genuinely don’t know whether improving TTS quality makes the uncanny valley problem better or worse. My instinct says worse, at least for a while.
Has anyone cracked interruption handling in a way that actually feels natural? That’s the one I keep getting stuck on.
Full piece is on Substack, happy to share the link if that’s allowed here.
submitted by /u/beanbag28
[link] [comments]