Audio AI UX patterns

Audio patterns for voice input, live transcription, summarization, and voice-driven actions.

Start here

Core patterns for audio UX.

11 patterns

Frequently asked questions

What should I design first for voice AI?

Start with capture feedback, transcript visibility, and clear handoff between listening, processing, and action—not only a mic icon.

How do voice input and dictation modes differ?

Voice input often implies conversational turn-taking. Dictation is one-way speech-to-text into a field—use different indicators and interruption rules.

When is a live transcript required?

For meetings, interviews, accessibility, and any flow where users must correct words in real time. Delayed transcripts-only hide errors until too late.

What should voice visualizers communicate?

Listening vs processing vs speaking states, and errors like low volume or denied mic permission. Animation without state labels confuses users.

How do voice-to-action patterns stay safe?

Confirm destructive commands, show what will run, and offer text fallback. Ambient voice without confirmation causes costly mistakes.

Which audio patterns support multilingual products?

Real-time translation and language-toggle patterns appear in the catalog—pair with clear source/target language display so users know what was heard.