Gemini 2.5 Flash

gemini-2.5-flash

Gemini 2.5 Flash is the balanced price-performance model in the Gemini 2.5 family, combining thinking capability with lower latency and cost. Google materials position it between Pro-level depth and Flash-Lite efficiency. It is suitable for production tasks that need reasoning, multimodal input, and practical throughput.

Total Context

1Mtokens

Max Output

65.5Ktokens

Released

Jun 17, 2025

Modalities

Gemini 2.5 Flash Price

Input PriceOutput PriceCache Read
$0.3/M$2.5/M$0.03/M

Gemini 2.5 Flash API

POST /v1/chat/completions

Gemini 2.5 Flash Benchmark

14.1

/100

Artificial Analysis Intelligence Index

Artificial Analysis broad capability aggregate

Index score

17.8

/100

Artificial Analysis Coding Index

Artificial Analysis software task aggregate

Index score

60.3

/100

Artificial Analysis Math Index

Artificial Analysis math reasoning aggregate

Index score

Knowledge & Reasoning

MMLU-Pro

Advanced multi-task knowledge

80.9%

GPQA

Advanced science problem solving

68.3%

HLE

Broad expert-level exam set

5.1%

Coding & Engineering

LiveCodeBench

Live coding problems

49.5%

SciCode

Scientific coding challenges

29.1%

Terminal-Bench Hard

Hard terminal task execution

12.1%

Math

MATH-500

Advanced math problem solving

93.2%

AIME

Competition math problems

50%

AIME 2025

Competition math problems

60.3%

Instruction Following & Agent Tasks

IFBench

Prompt constraint adherence

39.0%

AA-LCR

Long-context reasoning

45.9%

τ²-Bench

Agent workflow tasks

14.9%

Metrics sourced from Artificial Analysis

Media and Discussions

Selected public videos and posts related to this model.

X (Twitter)

View post on X
View post on X
View post on X

Reddit

YouTube

Watch on YouTube
Watch on YouTube
Watch on YouTube

Frequently asked questions about Gemini 2.5 Flash

Understand what Gemini 2.5 Flash is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.

How should developers understand the role of Gemini 2.5 Flash?+

Gemini 2.5 Flash is Google’s balanced Gemini 2.5 Flash model for high-volume, low-latency tasks that still benefit from thinking. It remains a defined model generation, but newer models in the same family may be preferable for new evaluations.

When does Gemini 2.5 Flash deliver the most practical value?+

Best-fit scenarios include high-volume application requests, reliable execution of multi-step agent workflows, and analysis of text and visual inputs. Test representative inputs and define measurable acceptance criteria before production.

What are the most useful characteristics of Gemini 2.5 Flash?+

Key strengths include a strong balance of quality, speed, and cost, fast response times, and strong reasoning on difficult problems. This combination is especially useful for reliable execution of multi-step agent workflows.

What are the practical limits of Gemini 2.5 Flash?+

Consider another model when the task needs the strongest Pro-tier reasoning, the project can adopt a newer Gemini generation, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.

How should developers call Gemini 2.5 Flash through TokenHub?+

In TokenHub, select the exact model identifier displayed for Gemini 2.5 Flash, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm the TokenHub-exposed input types, tools, grounding options, and model lifecycle rather than assuming full Gemini API parity.