POST /v1/chat/completionsGemini 2.5 Flash
gemini-2.5-flashGemini 2.5 Flash is the balanced price-performance model in the Gemini 2.5 family, combining thinking capability with lower latency and cost. Google materials position it between Pro-level depth and Flash-Lite efficiency. It is suitable for production tasks that need reasoning, multimodal input, and practical throughput.
Total Context
1Mtokens
Max Output
65.5Ktokens
Released
Jun 17, 2025
Modalities
Gemini 2.5 Flash Price
| Input Price | Output Price | Cache Read |
|---|---|---|
| $0.3/M | $2.5/M | $0.03/M |
Gemini 2.5 Flash API
Gemini 2.5 Flash Benchmark
14.1
/100
Artificial Analysis Intelligence Index
Artificial Analysis broad capability aggregate
Index score
17.8
/100
Artificial Analysis Coding Index
Artificial Analysis software task aggregate
Index score
60.3
/100
Artificial Analysis Math Index
Artificial Analysis math reasoning aggregate
Index score
Knowledge & Reasoning
MMLU-Pro
Advanced multi-task knowledge
80.9%
GPQA
Advanced science problem solving
68.3%
HLE
Broad expert-level exam set
5.1%
Coding & Engineering
LiveCodeBench
Live coding problems
49.5%
SciCode
Scientific coding challenges
29.1%
Terminal-Bench Hard
Hard terminal task execution
12.1%
Math
MATH-500
Advanced math problem solving
93.2%
AIME
Competition math problems
50%
AIME 2025
Competition math problems
60.3%
Instruction Following & Agent Tasks
IFBench
Prompt constraint adherence
39.0%
AA-LCR
Long-context reasoning
45.9%
τ²-Bench
Agent workflow tasks
14.9%
Metrics sourced from Artificial Analysis
Frequently asked questions about Gemini 2.5 Flash
Understand what Gemini 2.5 Flash is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.
How should developers understand the role of Gemini 2.5 Flash?+
Gemini 2.5 Flash is Google’s balanced Gemini 2.5 Flash model for high-volume, low-latency tasks that still benefit from thinking. It remains a defined model generation, but newer models in the same family may be preferable for new evaluations.
When does Gemini 2.5 Flash deliver the most practical value?+
Best-fit scenarios include high-volume application requests, reliable execution of multi-step agent workflows, and analysis of text and visual inputs. Test representative inputs and define measurable acceptance criteria before production.
What are the most useful characteristics of Gemini 2.5 Flash?+
Key strengths include a strong balance of quality, speed, and cost, fast response times, and strong reasoning on difficult problems. This combination is especially useful for reliable execution of multi-step agent workflows.
What are the practical limits of Gemini 2.5 Flash?+
Consider another model when the task needs the strongest Pro-tier reasoning, the project can adopt a newer Gemini generation, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.
How should developers call Gemini 2.5 Flash through TokenHub?+
In TokenHub, select the exact model identifier displayed for Gemini 2.5 Flash, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm the TokenHub-exposed input types, tools, grounding options, and model lifecycle rather than assuming full Gemini API parity.
Media and Discussions
Selected public videos and posts related to this model.
X (Twitter)
Reddit
YouTube