Grok composer UX: skills, models & voice input
Updated June 13, 2026
Grok optimizes for time-to-first-query on a nearly empty canvas with no feed or onboarding carousel. Depth, connectors, and SuperGrok promos share that focal band, which trades calm for monetization pressure when the product wants attention.
Calm default

What works
- No starter pills, no tool chips, no mode vocabulary on first load. You can just type.
- Fast shows before the first keystroke. Price-sensitive users see what they are spending on; everyone sees model tier in one control.
- + on the left and model on the right keep attach separate from which brain answers.
What we would push on
- Imagine and Private live top-right, far from the bar. Image gen and privacy modes feel like separate products.
- Mic and waveform sit as plain icons side by side on the right rail. Dictation and voice conversation look the same until you tap.
- Placeholder copy shifts between sessions (How can I help you today versus What do you want to know?). Small inconsistency on a page with almost no other text.
Business strategy
Radical calm plus visible Fast tier drives typing-first engagement while signaling cost tier to price-sensitive users before they spend tokens.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Fast on the bar before first keystroke | Honest model visibility on an empty landing | Imagine and Private hide top-right; voice icons indistinguishable until tap |
Takeaway
Radical calm with honest model visibility. The bar split works. Voice entry icons and top-right modes need clearer differentiation.
Pattern: Tool Switching in Composer
Pattern: Model Selection UI
Feature discovery

What works
- Connectors banner teaches app wiring at the point of ask, with Dismiss and Connect so users stay in control.
- App icons in the banner preview what connectors mean before you open settings.
- Feature education sits directly under the bar where eyes already landed.
What we would push on
- SuperGrok countdown pill fights the bar for attention on the same screen as the connectors banner.
- Two growth surfaces at once on an otherwise empty page. A lot of noise for a calm default.
- Orange Claim Offer is the only strong color on the page. It pulls focus from the composer itself.
Business strategy
Inline connectors banner converts workspace integrations at the moment of intent without a separate onboarding flow or settings dig.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Connectors banner plus SuperGrok promo on empty landing | Feature and monetization surfacing at point of ask | Two growth surfaces fight the calm composer focal band |
Takeaway
Good inline feature surfacing for connectors. Keep upsell chrome out of the composer focal band or sequence promos so they do not stack.
Pattern: Prompt Templates
Pattern: Progressive Disclosure
Attachments

What works
- The image sits inside the composer pill as a small thumb with x. You see what Grok will read without leaving the bar.
- Attachment does not break the pill shape. The bar still reads as one control.
What we would push on
- Inline preview shows pixels, not filename and type. Harder to scan when you stack multiple files.
- Thumbnail competes with + for space on the left. Dense once you add skills chips too.
Business strategy
Inline multimodal preview keeps image questions in Grok instead of requiring users to describe uploads in text alone.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Compact thumb without filename | Preserves pill shape and calm bar | Hard to scan multiple files; pixels not metadata |
Takeaway
Steal compact inline preview for a single image. Add filename chips when users attach more than one thing.
Pattern: Context Chip Management
Ready to send

What works
- Send affordance appears as a filled black arrow once there is content to ship. Clear shippable state.
- Image thumb and text coexist in one pill without pushing model controls off the bar.
- Fast stays visible so model choice does not disappear when the bar gets busy.
What we would push on
- Send icon swaps from waveform to arrow when text appears. The right rail changes meaning as you type.
- No explicit filename on the image chip. You trust the thumb, not a label.
Business strategy
A clear shippable state (filled black send) maximizes completion when multimodal prompts are armed and model tier stays visible.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Send icon swaps from waveform to arrow when typing | Clear ready-to-send affordance | Right rail changes meaning; send feels like a mode swap |
Takeaway
Strong ready-to-send state. Keep right-rail iconography consistent so send does not feel like a mode swap.
Pattern: Context Chip Management
Skills applied

What works
- Armed skills show as chips inside the bar. Scope is visible before send.
- Same chip grammar as file attachments. One visual system for what will affect the next message.
What we would push on
- Chips say docx and pdf, not the skill name or outcome. Extension labels hide what Grok will actually do.
- Multiple chips crowd the left side of the pill fast once you add an image too.
Business strategy
In-bar skill chips make scope honest before send, reducing failed runs when the wrong capability is armed.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Chips labeled docx/pdf not skill outcome | Same chip grammar as file attachments | Extension labels hide what Grok will actually do |
Takeaway
Chip persistence is right. Rename chips to match the tooltip copy users saw when they picked the skill.
Pattern: Context Chip Management
Model picker

What works
- Fast, Expert, and Heavy use outcome labels with model version in subtext. Capabilities without jargon.
- Auto row explains it chooses Fast or Expert. Intelligent default is one tap away.
- Custom instructions at the bottom shows Concise as current state. Settings visible inside the same menu.
What we would push on
- SuperGrok upgrade sits inside the model menu between capability rows and settings. Mixes paywall with model pick.
- Fast is selected by default, not Auto. Power users get transparency; casual users may not discover routing.
- Bar still says Fast while the menu exposes four tiers plus upgrade. Naming is consistent but the ladder is long.
Business strategy
Tier ladder with outcome labels supports upgrade to Heavy and Expert while Auto offers intelligent routing for casual users who do not want to manage tiers.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| SuperGrok upgrade inside model menu | Upgrade visible at model-selection moment | Mixes paywall with capability choice |
Takeaway
Strong tier naming and honest subtext. Pull upgrade into its own row or footer so model choice stays about capability.
Pattern: Model Selection UI
Pattern: Progressive Disclosure
Voice dictation

What works
- Listening replaces the right rail with a waveform and checkmark. Clear in-bar dictation mode.
- You stay on the landing page. No full-screen handoff for a short utterance.
What we would push on
- Mic and waveform look the same on the default bar. Users only learn the difference after tapping.
- Waveform control inside a narrow pill can feel cramped on desktop.
Business strategy
Inline dictation lowers the barrier for mobile and accessibility users (speak, edit, send) without forcing everyone into a full voice session up front.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Mic and waveform identical on default bar | Clean right rail on calm landing | Users only learn dictation vs conversation after tapping |
Takeaway
Inline dictation works. Differentiate mic versus conversation icons on the default state, not only after activation.
Pattern: Voice Input
Voice session

What works
- You may start speaking sets expectations before the first word. Human entry to hands-free mode.
- Stop is prominent and unambiguous. Easy exit from a live session.
- Paperclip still available in voice mode. Attach does not disappear when you switch modalities.
What we would push on
- Composer layout changes completely between text and voice. The bar you learned on the landing page is not the bar in session.
- Ara · Assistant persona picker only shows in voice mode. Text users may never see persona controls.
Business strategy
Full voice sessions target hands-free use (commutes, workouts) where dictation-to-text is the wrong job and longer session time supports SuperGrok positioning.
Tradeoff
| Decision | Benefit | Cost |
|---|---|---|
| Completely different composer layout in voice vs text | Optimized conversation shell with clear Stop | Bar learned on landing does not carry into session; persona hidden from text users |
Takeaway
Strong conversation shell with a clear Stop. Bridge text and voice layouts so the bar does not feel like two products.
Pattern: Voice Input
Pattern: Model Selection UI
How it fits together
The pattern
- Calm pill first, + for uploads, skills, and connectors.
- Fast on the right rail, separate from attach on the left.
- Chips inside the bar for files and armed skills before send.
Where it varies
- Growth banners and SuperGrok promos stack on an otherwise empty page.
- Skills labeled as file extensions, not outcomes.
- Mic and waveform indistinguishable until activated.
- Text and voice composers use different layouts.
Business strategy
Grok bets on maximum calm and visible model tier. When growth chrome, skill naming, and voice entry lack the same discipline as the default bar, that shell feels unreliable and hurts trust in paid tiers.
Tradeoffs
| Decision | Benefit | Cost |
|---|---|---|
| Fast on the bar before first keystroke | Honest model visibility on an empty landing | Imagine and Private hide top-right; voice icons indistinguishable until tap |
| Connectors banner plus SuperGrok promo on empty landing | Feature and monetization surfacing at point of ask | Two growth surfaces fight the calm composer focal band |
| Equal weight for upload, skills, and connectors in + menu | Scannable four-row first screen | Agent vocabulary beside upload may confuse casual chat users |
| Compact thumb without filename | Preserves pill shape and calm bar | Hard to scan multiple files; pixels not metadata |
| Send icon swaps from waveform to arrow when typing | Clear ready-to-send affordance | Right rail changes meaning; send feels like a mode swap |
| Three-level path to arm one skill | Parent menu stays open; tooltips teach capability | Heavy path for common picks; extension labels not outcomes |
| Chips labeled docx/pdf not skill outcome | Same chip grammar as file attachments | Extension labels hide what Grok will actually do |
| SuperGrok upgrade inside model menu | Upgrade visible at model-selection moment | Mixes paywall with capability choice |
| Mic and waveform identical on default bar | Clean right rail on calm landing | Users only learn dictation vs conversation after tapping |
| Completely different composer layout in voice vs text | Optimized conversation shell with clear Stop | Bar learned on landing does not carry into session; persona hidden from text users |
Takeaway
Grok bets on maximum calm and visible model tier. The + tree and in-bar chips carry scope honestly. Growth chrome, skill naming, and voice entry need the same discipline the default bar already has.
Pattern: Tool Switching in ComposerCalm pill → + for scope → Fast on the right is a reusable shell; growth chrome and voice entry are where Grok diverges.
Pattern: Model Selection UI
Steal this
- Fast visible in the bar before the first keystroke
- Connectors banner with Dismiss and Connect at the point of ask
- Inline image thumb inside the composer pill
- Skill tooltips beside flyout rows
- Custom instructions state shown inside the model menu
Skip this
- Stacking upsell pill and feature banner on the same empty landing
- Skill chips labeled docx instead of the job they perform
- Mic and waveform icons that look identical on default
- SuperGrok upgrade row inside the model capability list
How others design the composer
Same job, different product bets, and what each tradeoff reveals.
Claude uses a card composer with starter pills; Grok keeps the landing page almost empty.
Read teardownChatGPT arms modes inside +; Grok puts model tier on the bar and skills behind +.
Read teardownGemini splits + and model picker like Grok, but keeps Google tools in a deeper + tree.
Read teardownOriginal gallery pages: Tool Switching in Composer · Multimodal Input · Model Selection · Persona Switcher

