Audio AI UX patterns
Audio patterns for voice input, live transcription, summarization, and voice-driven actions.
Start here
Core patterns for audio UX.
11 patterns
Live Transcript
Real-time text from audio
Voice Visualizer
Feedback for voice mode
Voice Cloning
Clone and use custom voices
Real-time Translation
Live translation during voice conversations
Audio Enhancement
Noise reduction, clarity improvement
Voice Commands
Trigger actions via voice commands
Audio Summarization
Summarize long audio recordings
Activation Boundaries
Explicit starts and stops for always-on agents
Interruptibility
Single gesture to pause or cancel
Voice Confirmation
Spoken approval for high-stakes voice actions
Multi-User Awareness
Identify speaker and scope to their permissions
Frequently asked questions
What should I design first for voice AI?
Start with capture feedback, transcript visibility, and clear handoff between listening, processing, and action—not only a mic icon.
How do voice input and dictation modes differ?
Voice input often implies conversational turn-taking. Dictation is one-way speech-to-text into a field—use different indicators and interruption rules.
When is a live transcript required?
For meetings, interviews, accessibility, and any flow where users must correct words in real time. Delayed transcripts-only hide errors until too late.
What should voice visualizers communicate?
Listening vs processing vs speaking states, and errors like low volume or denied mic permission. Animation without state labels confuses users.
How do voice-to-action patterns stay safe?
Confirm destructive commands, show what will run, and offer text fallback. Ambient voice without confirmation causes costly mistakes.
Which audio patterns support multilingual products?
Real-time translation and language-toggle patterns appear in the catalog—pair with clear source/target language display so users know what was heard.