Performance AI UX patterns
Performance patterns surface latency, progress, rate limits, and resource usage so users trust the system is working.
Start here
Core patterns for performance UX.
12 patterns
Token Usage Indicator
API quota display
Processing Time Estimates
Expected wait times
Cost Transparency
Show operation costs
Rate Limit Warnings
API limit alerts
Caching Indicators
Show when cached results are used
Model Selection UI
Let users choose AI model (speed vs quality)
Batch Processing Queue
Queue multiple requests for efficiency
Performance Optimization Tips
AI suggests ways to improve performance
Resource Usage Dashboard
Visual dashboard of compute/memory usage
Hard Budget Ceilings
Enforceable caps that stop the agent
Running Meters
Live token and cost counters during execution
Cross-Session Budget
Spending caps that persist across chats and devices
Frequently asked questions
Why dedicate patterns to performance UX?
Long-running model work breaks flow without honest progress and limits. These patterns prevent silent failure and surprise throttling.
What should users see during long generations?
Elapsed time, staged progress, cancel, and optional step detail. Indeterminate spinners alone are insufficient for multi-minute agent jobs.
How do cost and token indicators help?
They set expectations before spend accumulates—especially for API-backed tools and team admins. Sudden hard stops without warning feel punitive.
When should rate-limit warnings appear?
Before users hit the wall: soft warnings with reset time, upgrade path, or queue position. Post-hoc errors after lost work destroy trust.
What is the difference between caching indicators and skeleton loaders?
Caching signals instant or near-instant reuse of prior work. Skeletons cover cold loads—use caching honesty so users know why a reply was fast.
Do performance patterns apply to voice and video AI?
Yes—processing-time estimates and running meters matter for transcription and media jobs where latency is multiplicative, not a single token stream.