Grok 4.20

grok-4.20

Grok-4.20 is presented as a high-speed xAI reasoning model with strong agentic tool calling, strict prompt adherence, low hallucination, and a very large context window in third-party cards. It is meant for long-form analysis, deep research, and multi-step agent workflows. The strongest description should emphasize speed plus long-context reasoning, not just raw conversational personality.

Total Context

1Mtokens

Max Output

30Ktokens

Released

Mar 9, 2026

Modalities

Grok 4.20 Price

Input PriceOutput PriceCache Read
$1.25/M$2.5/M$0.2/M

Grok 4.20 API

POST /v1beta/models/{model}:generateContent

Grok 4.20 Benchmark

Grok 4.20 0309 (Reasoning)

36.5

/100

Artificial Analysis Intelligence Index

Artificial Analysis broad capability aggregate

Index score

42.2

/100

Artificial Analysis Coding Index

Artificial Analysis software task aggregate

Index score

Knowledge & Reasoning

GPQA

Advanced science problem solving

88.5%

HLE

Broad expert-level exam set

30%

Coding & Engineering

SciCode

Scientific coding challenges

44.7%

Terminal-Bench Hard

Hard terminal task execution

40.9%

Instruction Following & Agent Tasks

IFBench

Prompt constraint adherence

82.9%

AA-LCR

Long-context reasoning

59%

τ²-Bench

Agent workflow tasks

96.5%

Metrics sourced from Artificial Analysis

Media and Discussions

Selected public videos and posts related to this model.

X (Twitter)

View post on X
View post on X
View post on X

Reddit

YouTube

Watch on YouTube
Watch on YouTube
Watch on YouTube

Frequently asked questions about Grok 4.20

Understand what Grok 4.20 is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.

Where does Grok 4.20 sit within its provider’s model family?+

Grok 4.20 is xAI’s high-performance Grok 4.20 model for reasoning, long context, and agentic tool calling. It is a beta model, so validate latency, output consistency, and supported features before production use.

Which production scenarios suit Grok 4.20?+

Best-fit scenarios include reliable execution of multi-step agent workflows, analysis of long documents and datasets, and complex multi-step reasoning. Test representative inputs and define measurable acceptance criteria before production.

What makes Grok 4.20 stand out for analysis of long documents and datasets?+

Key strengths include effective use of tools and function calls, strict adherence to prompts, and configurable reasoning effort. This combination is especially useful for analysis of long documents and datasets.

What tradeoffs should developers consider with Grok 4.20?+

Consider another model when the application needs production behavior that is already fully stabilized, the task is simple enough for a non-reasoning variant, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.

How can a team safely start using Grok 4.20 on TokenHub?+

In TokenHub, select the exact model identifier displayed for Grok 4.20, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Check the TokenHub page for the exact Grok identifier, available reasoning controls, tool access, supported inputs, and current beta or release status.