Grok composer UX: skills, models & voice input

Updated June 13, 2026

Grok optimizes for time-to-first-query on a nearly empty canvas with no feed or onboarding carousel. Depth, connectors, and SuperGrok promos share that focal band, which trades calm for monetization pressure when the product wants attention.

Calm default

Empty landing with Fast visible in the bar. + on the left, mic and voice icons on the right.

What works

No starter pills, no tool chips, no mode vocabulary on first load. You can just type.
Fast shows before the first keystroke. Price-sensitive users see what they are spending on; everyone sees model tier in one control.
+ on the left and model on the right keep attach separate from which brain answers.

What we would push on

Imagine and Private live top-right, far from the bar. Image gen and privacy modes feel like separate products.
Mic and waveform sit as plain icons side by side on the right rail. Dictation and voice conversation look the same until you tap.
Placeholder copy shifts between sessions (How can I help you today versus What do you want to know?). Small inconsistency on a page with almost no other text.

Business strategy

Radical calm plus visible Fast tier drives typing-first engagement while signaling cost tier to price-sensitive users before they spend tokens.

Tradeoff

Decision	Benefit	Cost
Fast on the bar before first keystroke	Honest model visibility on an empty landing	Imagine and Private hide top-right; voice icons indistinguishable until tap

Takeaway

Radical calm with honest model visibility. The bar split works. Voice entry icons and top-right modes need clearer differentiation.

Pattern: Tool Switching in Composer

Pattern: Model Selection UI

Feature discovery

Connectors banner below the bar and SuperGrok countdown pill in the corner.

What works

Connectors banner teaches app wiring at the point of ask, with Dismiss and Connect so users stay in control.
App icons in the banner preview what connectors mean before you open settings.
Feature education sits directly under the bar where eyes already landed.

What we would push on

SuperGrok countdown pill fights the bar for attention on the same screen as the connectors banner.
Two growth surfaces at once on an otherwise empty page. A lot of noise for a calm default.
Orange Claim Offer is the only strong color on the page. It pulls focus from the composer itself.

Business strategy

Inline connectors banner converts workspace integrations at the moment of intent without a separate onboarding flow or settings dig.

Tradeoff

Decision	Benefit	Cost
Connectors banner plus SuperGrok promo on empty landing	Feature and monetization surfacing at point of ask	Two growth surfaces fight the calm composer focal band

Takeaway

Good inline feature surfacing for connectors. Keep upsell chrome out of the composer focal band or sequence promos so they do not stack.

Pattern: Prompt Templates

Pattern: Progressive Disclosure

Upload a file, Recent, Skills, and Add connector behind + on the left.

What works

Short first screen: four rows, scannable in one glance.
+ becomes x while open. Clear exit without clicking away.
Upload and connectors stay behind + so the default bar stays clean.

What we would push on

Recent, Skills, and Add connector all chevron into submenus. Three overflow paths with similar weight.
Skills and connectors sit beside upload with equal visual priority. Agent vocabulary in the composer may confuse casual chat users.
Attach still lives inside +, not on the bar. Sending a file is not an edge case.

Business strategy

A short + menu keeps agent vocabulary (skills, connectors) off the default bar while enabling xAI’s connector and skills ecosystem inside chat.

Tradeoff

Decision	Benefit	Cost
Equal weight for upload, skills, and connectors in + menu	Scannable four-row first screen	Agent vocabulary beside upload may confuse casual chat users

Takeaway

Standard + pattern that preserves calm. The nested flyouts are where complexity shows up, not on the bar.

Pattern: Tool Switching in Composer

Attachments

Image thumbnail chip inside the composer pill with dismiss control.

What works

The image sits inside the composer pill as a small thumb with x. You see what Grok will read without leaving the bar.
Attachment does not break the pill shape. The bar still reads as one control.

What we would push on

Inline preview shows pixels, not filename and type. Harder to scan when you stack multiple files.
Thumbnail competes with + for space on the left. Dense once you add skills chips too.

Business strategy

Inline multimodal preview keeps image questions in Grok instead of requiring users to describe uploads in text alone.

Tradeoff

Decision	Benefit	Cost
Compact thumb without filename	Preserves pill shape and calm bar	Hard to scan multiple files; pixels not metadata

Takeaway

Steal compact inline preview for a single image. Add filename chips when users attach more than one thing.

Pattern: Context Chip Management

Ready to send

Image preview, typed prompt, and black send button appear together in the bar.

What works

Send affordance appears as a filled black arrow once there is content to ship. Clear shippable state.
Image thumb and text coexist in one pill without pushing model controls off the bar.
Fast stays visible so model choice does not disappear when the bar gets busy.

What we would push on

Send icon swaps from waveform to arrow when text appears. The right rail changes meaning as you type.
No explicit filename on the image chip. You trust the thumb, not a label.

Business strategy

A clear shippable state (filled black send) maximizes completion when multimodal prompts are armed and model tier stays visible.

Tradeoff

Decision	Benefit	Cost
Send icon swaps from waveform to arrow when typing	Clear ready-to-send affordance	Right rail changes meaning; send feels like a mode swap

Takeaway

Strong ready-to-send state. Keep right-rail iconography consistent so send does not feel like a mode swap.

Pattern: Context Chip Management

Skills submenu

Skills flyout with docx, pdf, pptx, and xlsx rows plus inline tooltip.

What works

Tooltip explains what docx does without opening another screen. Capability copy beside the row.
Create skill and View all skills at the bottom give power users an exit without cluttering the first list.
Parent + menu stays open while the skills flyout is active. Less disorienting than a full page swap.

What we would push on

Three levels deep to pick one skill: +, Skills, then docx. Heavy for a single attach intent.
docx, pdf, pptx, and xlsx read like file types, not outcomes. Users may not know what skill they armed.

Business strategy

Skills flyout exposes doc-generation capabilities inside chat to compete with Copilot-style document workflows without a separate editor.

Tradeoff

Decision	Benefit	Cost
Three-level path to arm one skill	Parent menu stays open; tooltips teach capability	Heavy path for common picks; extension labels not outcomes

Takeaway

Skills belong in +, but label them by job (Edit Word doc) not extension. Flatten the path if these are common picks.

Pattern: Tool Switching in Composer

Pattern: Progressive Disclosure

Skills applied

docx and pdf skill chips armed inside the bar before send.

What works

Armed skills show as chips inside the bar. Scope is visible before send.
Same chip grammar as file attachments. One visual system for what will affect the next message.

What we would push on

Chips say docx and pdf, not the skill name or outcome. Extension labels hide what Grok will actually do.
Multiple chips crowd the left side of the pill fast once you add an image too.

Business strategy

In-bar skill chips make scope honest before send, reducing failed runs when the wrong capability is armed.

Tradeoff

Decision	Benefit	Cost
Chips labeled docx/pdf not skill outcome	Same chip grammar as file attachments	Extension labels hide what Grok will actually do

Takeaway

Chip persistence is right. Rename chips to match the tooltip copy users saw when they picked the skill.

Pattern: Context Chip Management

Model picker

Fast, Auto, Expert, and Heavy tiers with Grok 4.3 subtext. SuperGrok upgrade and custom instructions at the bottom.

What works

Fast, Expert, and Heavy use outcome labels with model version in subtext. Capabilities without jargon.
Auto row explains it chooses Fast or Expert. Intelligent default is one tap away.
Custom instructions at the bottom shows Concise as current state. Settings visible inside the same menu.

What we would push on

SuperGrok upgrade sits inside the model menu between capability rows and settings. Mixes paywall with model pick.
Fast is selected by default, not Auto. Power users get transparency; casual users may not discover routing.
Bar still says Fast while the menu exposes four tiers plus upgrade. Naming is consistent but the ladder is long.

Business strategy

Tier ladder with outcome labels supports upgrade to Heavy and Expert while Auto offers intelligent routing for casual users who do not want to manage tiers.

Tradeoff

Decision	Benefit	Cost
SuperGrok upgrade inside model menu	Upgrade visible at model-selection moment	Mixes paywall with capability choice

Takeaway

Strong tier naming and honest subtext. Pull upgrade into its own row or footer so model choice stays about capability.

Pattern: Model Selection UI

Pattern: Progressive Disclosure

Voice dictation

Inline listening state with waveform control and confirm button inside the bar.

What works

Listening replaces the right rail with a waveform and checkmark. Clear in-bar dictation mode.
You stay on the landing page. No full-screen handoff for a short utterance.

What we would push on

Mic and waveform look the same on the default bar. Users only learn the difference after tapping.
Waveform control inside a narrow pill can feel cramped on desktop.

Business strategy

Inline dictation lowers the barrier for mobile and accessibility users (speak, edit, send) without forcing everyone into a full voice session up front.

Tradeoff

Decision	Benefit	Cost
Mic and waveform identical on default bar	Clean right rail on calm landing	Users only learn dictation vs conversation after tapping

Takeaway

Inline dictation works. Differentiate mic versus conversation icons on the default state, not only after activation.

Pattern: Voice Input

Voice session

Full voice mode with You may start speaking, Stop, and Ara · Assistant persona in the bar.

What works

You may start speaking sets expectations before the first word. Human entry to hands-free mode.
Stop is prominent and unambiguous. Easy exit from a live session.
Paperclip still available in voice mode. Attach does not disappear when you switch modalities.

What we would push on

Composer layout changes completely between text and voice. The bar you learned on the landing page is not the bar in session.
Ara · Assistant persona picker only shows in voice mode. Text users may never see persona controls.

Business strategy

Full voice sessions target hands-free use (commutes, workouts) where dictation-to-text is the wrong job and longer session time supports SuperGrok positioning.

Tradeoff

Decision	Benefit	Cost
Completely different composer layout in voice vs text	Optimized conversation shell with clear Stop	Bar learned on landing does not carry into session; persona hidden from text users

Takeaway

Strong conversation shell with a clear Stop. Bridge text and voice layouts so the bar does not feel like two products.

Pattern: Voice Input

Pattern: Model Selection UI

How it fits together

The pattern

Calm pill first, + for uploads, skills, and connectors.
Fast on the right rail, separate from attach on the left.
Chips inside the bar for files and armed skills before send.

Where it varies

Growth banners and SuperGrok promos stack on an otherwise empty page.
Skills labeled as file extensions, not outcomes.
Mic and waveform indistinguishable until activated.
Text and voice composers use different layouts.

Business strategy

Grok bets on maximum calm and visible model tier. When growth chrome, skill naming, and voice entry lack the same discipline as the default bar, that shell feels unreliable and hurts trust in paid tiers.

Tradeoffs

Decision	Benefit	Cost
Fast on the bar before first keystroke	Honest model visibility on an empty landing	Imagine and Private hide top-right; voice icons indistinguishable until tap
Connectors banner plus SuperGrok promo on empty landing	Feature and monetization surfacing at point of ask	Two growth surfaces fight the calm composer focal band
Equal weight for upload, skills, and connectors in + menu	Scannable four-row first screen	Agent vocabulary beside upload may confuse casual chat users
Compact thumb without filename	Preserves pill shape and calm bar	Hard to scan multiple files; pixels not metadata
Send icon swaps from waveform to arrow when typing	Clear ready-to-send affordance	Right rail changes meaning; send feels like a mode swap
Three-level path to arm one skill	Parent menu stays open; tooltips teach capability	Heavy path for common picks; extension labels not outcomes
Chips labeled docx/pdf not skill outcome	Same chip grammar as file attachments	Extension labels hide what Grok will actually do
SuperGrok upgrade inside model menu	Upgrade visible at model-selection moment	Mixes paywall with capability choice
Mic and waveform identical on default bar	Clean right rail on calm landing	Users only learn dictation vs conversation after tapping
Completely different composer layout in voice vs text	Optimized conversation shell with clear Stop	Bar learned on landing does not carry into session; persona hidden from text users

Takeaway

Grok bets on maximum calm and visible model tier. The + tree and in-bar chips carry scope honestly. Growth chrome, skill naming, and voice entry need the same discipline the default bar already has.

Pattern: Tool Switching in ComposerCalm pill → + for scope → Fast on the right is a reusable shell; growth chrome and voice entry are where Grok diverges.

Pattern: Model Selection UI