Grok logo

Grok composer UX: skills, models & voice input

Updated June 13, 2026

Grok optimizes for time-to-first-query on a nearly empty canvas with no feed or onboarding carousel. Depth, connectors, and SuperGrok promos share that focal band, which trades calm for monetization pressure when the product wants attention.

Calm default

Empty landing with Fast visible in the bar. + on the left, mic and voice icons on the right.
Empty landing with Fast visible in the bar. + on the left, mic and voice icons on the right.

What works

  • No starter pills, no tool chips, no mode vocabulary on first load. You can just type.
  • Fast shows before the first keystroke. Price-sensitive users see what they are spending on; everyone sees model tier in one control.
  • + on the left and model on the right keep attach separate from which brain answers.

What we would push on

  • Imagine and Private live top-right, far from the bar. Image gen and privacy modes feel like separate products.
  • Mic and waveform sit as plain icons side by side on the right rail. Dictation and voice conversation look the same until you tap.
  • Placeholder copy shifts between sessions (How can I help you today versus What do you want to know?). Small inconsistency on a page with almost no other text.

Business strategy

Radical calm plus visible Fast tier drives typing-first engagement while signaling cost tier to price-sensitive users before they spend tokens.

Tradeoff

DecisionBenefitCost
Fast on the bar before first keystrokeHonest model visibility on an empty landingImagine and Private hide top-right; voice icons indistinguishable until tap

Takeaway

Radical calm with honest model visibility. The bar split works. Voice entry icons and top-right modes need clearer differentiation.

Feature discovery

Connectors banner below the bar and SuperGrok countdown pill in the corner.
Connectors banner below the bar and SuperGrok countdown pill in the corner.

What works

  • Connectors banner teaches app wiring at the point of ask, with Dismiss and Connect so users stay in control.
  • App icons in the banner preview what connectors mean before you open settings.
  • Feature education sits directly under the bar where eyes already landed.

What we would push on

  • SuperGrok countdown pill fights the bar for attention on the same screen as the connectors banner.
  • Two growth surfaces at once on an otherwise empty page. A lot of noise for a calm default.
  • Orange Claim Offer is the only strong color on the page. It pulls focus from the composer itself.

Business strategy

Inline connectors banner converts workspace integrations at the moment of intent without a separate onboarding flow or settings dig.

Tradeoff

DecisionBenefitCost
Connectors banner plus SuperGrok promo on empty landingFeature and monetization surfacing at point of askTwo growth surfaces fight the calm composer focal band

Takeaway

Good inline feature surfacing for connectors. Keep upsell chrome out of the composer focal band or sequence promos so they do not stack.

+ menu

Upload a file, Recent, Skills, and Add connector behind + on the left.
Upload a file, Recent, Skills, and Add connector behind + on the left.

What works

  • Short first screen: four rows, scannable in one glance.
  • + becomes x while open. Clear exit without clicking away.
  • Upload and connectors stay behind + so the default bar stays clean.

What we would push on

  • Recent, Skills, and Add connector all chevron into submenus. Three overflow paths with similar weight.
  • Skills and connectors sit beside upload with equal visual priority. Agent vocabulary in the composer may confuse casual chat users.
  • Attach still lives inside +, not on the bar. Sending a file is not an edge case.

Business strategy

A short + menu keeps agent vocabulary (skills, connectors) off the default bar while enabling xAI’s connector and skills ecosystem inside chat.

Tradeoff

DecisionBenefitCost
Equal weight for upload, skills, and connectors in + menuScannable four-row first screenAgent vocabulary beside upload may confuse casual chat users

Takeaway

Standard + pattern that preserves calm. The nested flyouts are where complexity shows up, not on the bar.

Attachments

Image thumbnail chip inside the composer pill with dismiss control.
Image thumbnail chip inside the composer pill with dismiss control.

What works

  • The image sits inside the composer pill as a small thumb with x. You see what Grok will read without leaving the bar.
  • Attachment does not break the pill shape. The bar still reads as one control.

What we would push on

  • Inline preview shows pixels, not filename and type. Harder to scan when you stack multiple files.
  • Thumbnail competes with + for space on the left. Dense once you add skills chips too.

Business strategy

Inline multimodal preview keeps image questions in Grok instead of requiring users to describe uploads in text alone.

Tradeoff

DecisionBenefitCost
Compact thumb without filenamePreserves pill shape and calm barHard to scan multiple files; pixels not metadata

Takeaway

Steal compact inline preview for a single image. Add filename chips when users attach more than one thing.

Ready to send

Image preview, typed prompt, and black send button appear together in the bar.
Image preview, typed prompt, and black send button appear together in the bar.

What works

  • Send affordance appears as a filled black arrow once there is content to ship. Clear shippable state.
  • Image thumb and text coexist in one pill without pushing model controls off the bar.
  • Fast stays visible so model choice does not disappear when the bar gets busy.

What we would push on

  • Send icon swaps from waveform to arrow when text appears. The right rail changes meaning as you type.
  • No explicit filename on the image chip. You trust the thumb, not a label.

Business strategy

A clear shippable state (filled black send) maximizes completion when multimodal prompts are armed and model tier stays visible.

Tradeoff

DecisionBenefitCost
Send icon swaps from waveform to arrow when typingClear ready-to-send affordanceRight rail changes meaning; send feels like a mode swap

Takeaway

Strong ready-to-send state. Keep right-rail iconography consistent so send does not feel like a mode swap.

Skills submenu

Skills flyout with docx, pdf, pptx, and xlsx rows plus inline tooltip.
Skills flyout with docx, pdf, pptx, and xlsx rows plus inline tooltip.

What works

  • Tooltip explains what docx does without opening another screen. Capability copy beside the row.
  • Create skill and View all skills at the bottom give power users an exit without cluttering the first list.
  • Parent + menu stays open while the skills flyout is active. Less disorienting than a full page swap.

What we would push on

  • Three levels deep to pick one skill: +, Skills, then docx. Heavy for a single attach intent.
  • docx, pdf, pptx, and xlsx read like file types, not outcomes. Users may not know what skill they armed.

Business strategy

Skills flyout exposes doc-generation capabilities inside chat to compete with Copilot-style document workflows without a separate editor.

Tradeoff

DecisionBenefitCost
Three-level path to arm one skillParent menu stays open; tooltips teach capabilityHeavy path for common picks; extension labels not outcomes

Takeaway

Skills belong in +, but label them by job (Edit Word doc) not extension. Flatten the path if these are common picks.

Skills applied

docx and pdf skill chips armed inside the bar before send.
docx and pdf skill chips armed inside the bar before send.

What works

  • Armed skills show as chips inside the bar. Scope is visible before send.
  • Same chip grammar as file attachments. One visual system for what will affect the next message.

What we would push on

  • Chips say docx and pdf, not the skill name or outcome. Extension labels hide what Grok will actually do.
  • Multiple chips crowd the left side of the pill fast once you add an image too.

Business strategy

In-bar skill chips make scope honest before send, reducing failed runs when the wrong capability is armed.

Tradeoff

DecisionBenefitCost
Chips labeled docx/pdf not skill outcomeSame chip grammar as file attachmentsExtension labels hide what Grok will actually do

Takeaway

Chip persistence is right. Rename chips to match the tooltip copy users saw when they picked the skill.

Model picker

Fast, Auto, Expert, and Heavy tiers with Grok 4.3 subtext. SuperGrok upgrade and custom instructions at the bottom.
Fast, Auto, Expert, and Heavy tiers with Grok 4.3 subtext. SuperGrok upgrade and custom instructions at the bottom.

What works

  • Fast, Expert, and Heavy use outcome labels with model version in subtext. Capabilities without jargon.
  • Auto row explains it chooses Fast or Expert. Intelligent default is one tap away.
  • Custom instructions at the bottom shows Concise as current state. Settings visible inside the same menu.

What we would push on

  • SuperGrok upgrade sits inside the model menu between capability rows and settings. Mixes paywall with model pick.
  • Fast is selected by default, not Auto. Power users get transparency; casual users may not discover routing.
  • Bar still says Fast while the menu exposes four tiers plus upgrade. Naming is consistent but the ladder is long.

Business strategy

Tier ladder with outcome labels supports upgrade to Heavy and Expert while Auto offers intelligent routing for casual users who do not want to manage tiers.

Tradeoff

DecisionBenefitCost
SuperGrok upgrade inside model menuUpgrade visible at model-selection momentMixes paywall with capability choice

Takeaway

Strong tier naming and honest subtext. Pull upgrade into its own row or footer so model choice stays about capability.

Voice dictation

Inline listening state with waveform control and confirm button inside the bar.
Inline listening state with waveform control and confirm button inside the bar.

What works

  • Listening replaces the right rail with a waveform and checkmark. Clear in-bar dictation mode.
  • You stay on the landing page. No full-screen handoff for a short utterance.

What we would push on

  • Mic and waveform look the same on the default bar. Users only learn the difference after tapping.
  • Waveform control inside a narrow pill can feel cramped on desktop.

Business strategy

Inline dictation lowers the barrier for mobile and accessibility users (speak, edit, send) without forcing everyone into a full voice session up front.

Tradeoff

DecisionBenefitCost
Mic and waveform identical on default barClean right rail on calm landingUsers only learn dictation vs conversation after tapping

Takeaway

Inline dictation works. Differentiate mic versus conversation icons on the default state, not only after activation.

Pattern: Voice Input

Voice session

Full voice mode with You may start speaking, Stop, and Ara · Assistant persona in the bar.
Full voice mode with You may start speaking, Stop, and Ara · Assistant persona in the bar.

What works

  • You may start speaking sets expectations before the first word. Human entry to hands-free mode.
  • Stop is prominent and unambiguous. Easy exit from a live session.
  • Paperclip still available in voice mode. Attach does not disappear when you switch modalities.

What we would push on

  • Composer layout changes completely between text and voice. The bar you learned on the landing page is not the bar in session.
  • Ara · Assistant persona picker only shows in voice mode. Text users may never see persona controls.

Business strategy

Full voice sessions target hands-free use (commutes, workouts) where dictation-to-text is the wrong job and longer session time supports SuperGrok positioning.

Tradeoff

DecisionBenefitCost
Completely different composer layout in voice vs textOptimized conversation shell with clear StopBar learned on landing does not carry into session; persona hidden from text users

Takeaway

Strong conversation shell with a clear Stop. Bridge text and voice layouts so the bar does not feel like two products.

How it fits together

The pattern

  • Calm pill first, + for uploads, skills, and connectors.
  • Fast on the right rail, separate from attach on the left.
  • Chips inside the bar for files and armed skills before send.

Where it varies

  • Growth banners and SuperGrok promos stack on an otherwise empty page.
  • Skills labeled as file extensions, not outcomes.
  • Mic and waveform indistinguishable until activated.
  • Text and voice composers use different layouts.

Business strategy

Grok bets on maximum calm and visible model tier. When growth chrome, skill naming, and voice entry lack the same discipline as the default bar, that shell feels unreliable and hurts trust in paid tiers.

Tradeoffs

DecisionBenefitCost
Fast on the bar before first keystrokeHonest model visibility on an empty landingImagine and Private hide top-right; voice icons indistinguishable until tap
Connectors banner plus SuperGrok promo on empty landingFeature and monetization surfacing at point of askTwo growth surfaces fight the calm composer focal band
Equal weight for upload, skills, and connectors in + menuScannable four-row first screenAgent vocabulary beside upload may confuse casual chat users
Compact thumb without filenamePreserves pill shape and calm barHard to scan multiple files; pixels not metadata
Send icon swaps from waveform to arrow when typingClear ready-to-send affordanceRight rail changes meaning; send feels like a mode swap
Three-level path to arm one skillParent menu stays open; tooltips teach capabilityHeavy path for common picks; extension labels not outcomes
Chips labeled docx/pdf not skill outcomeSame chip grammar as file attachmentsExtension labels hide what Grok will actually do
SuperGrok upgrade inside model menuUpgrade visible at model-selection momentMixes paywall with capability choice
Mic and waveform identical on default barClean right rail on calm landingUsers only learn dictation vs conversation after tapping
Completely different composer layout in voice vs textOptimized conversation shell with clear StopBar learned on landing does not carry into session; persona hidden from text users

Takeaway

Grok bets on maximum calm and visible model tier. The + tree and in-bar chips carry scope honestly. Growth chrome, skill naming, and voice entry need the same discipline the default bar already has.

Pattern: Tool Switching in ComposerCalm pill → + for scope → Fast on the right is a reusable shell; growth chrome and voice entry are where Grok diverges.

Pattern: Model Selection UI

Steal this

  • Fast visible in the bar before the first keystroke
  • Connectors banner with Dismiss and Connect at the point of ask
  • Inline image thumb inside the composer pill
  • Skill tooltips beside flyout rows
  • Custom instructions state shown inside the model menu

Skip this

  • Stacking upsell pill and feature banner on the same empty landing
  • Skill chips labeled docx instead of the job they perform
  • Mic and waveform icons that look identical on default
  • SuperGrok upgrade row inside the model capability list

How others design the composer

Same job, different product bets, and what each tradeoff reveals.