POST /v1/messagesGPT-4.1
gpt-4.1GPT-4.1 is an OpenAI model generation focused on improved coding, instruction following, and long-context performance. Official announcements present it as a stronger developer model than GPT-4o for many programming and instruction-heavy tasks. Its catalog description should highlight practical coding reliability and long-context understanding.
Total Context
1Mtokens
Max Output
32.8Ktokens
Released
Apr 14, 2025
Modalities
GPT-4.1 Price
| Input Price | Output Price | Cache Read |
|---|---|---|
| $2/M | $8/M | $0.5/M |
GPT-4.1 API
GPT-4.1 Benchmark
GPT-4.1
19.4
/100
Artificial Analysis Intelligence Index
Artificial Analysis broad capability aggregate
Index score
21.8
/100
Artificial Analysis Coding Index
Artificial Analysis software task aggregate
Index score
34.7
/100
Artificial Analysis Math Index
Artificial Analysis math reasoning aggregate
Index score
Knowledge & Reasoning
MMLU-Pro
Advanced multi-task knowledge
80.6%
GPQA
Advanced science problem solving
66.6%
HLE
Broad expert-level exam set
4.6%
Coding & Engineering
LiveCodeBench
Live coding problems
45.7%
SciCode
Scientific coding challenges
38.1%
Terminal-Bench Hard
Hard terminal task execution
13.6%
Math
MATH-500
Advanced math problem solving
91.3%
AIME
Competition math problems
43.7%
AIME 2025
Competition math problems
34.7%
Instruction Following & Agent Tasks
IFBench
Prompt constraint adherence
43.0%
AA-LCR
Long-context reasoning
61%
τ²-Bench
Agent workflow tasks
47.1%
Metrics sourced from Artificial Analysis
Frequently asked questions about GPT-4.1
Understand what GPT-4.1 is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.
What is GPT-4.1, and where does it fit in OpenAI’s model lineup?+
GPT-4.1 is a high-capability, non-reasoning GPT model focused on instruction following, tool use, and long-context work. It has been retired from ChatGPT, while API availability may remain; check TokenHub’s current listing.
Which workloads are the best fit for GPT-4.1?+
Best-fit scenarios include working across large codebases, strict instruction following, and tool-enabled application workflows. Test representative inputs and define measurable acceptance criteria before production.
Why might a team select GPT-4.1 over a smaller or older model?+
Key strengths include strong handling of long context, reliable adherence to detailed instructions, and effective use of tools and function calls. This combination is especially useful for strict instruction following.
What should be validated before relying on GPT-4.1?+
Consider another model when the task needs the deepest deliberate reasoning, very low latency is the main requirement, or the workflow cannot include human review for important decisions. Run generated code through tests, security checks, and human review before merging or deployment.
What is the practical TokenHub setup guidance for GPT-4.1?+
In TokenHub, select the exact model identifier displayed for GPT-4.1, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm whether the TokenHub entry exposes the input types, tool behavior, and output controls your application needs.
Media and Discussions
Selected public videos and posts related to this model.
X (Twitter)
Reddit
YouTube