POST /v1/chat/completionsGemini 2.5 Flash-Lite
gemini-2.5-flash-liteGemini 2.5 Flash-Lite is Google’s fastest and most budget-friendly Gemini 2.5 option. Official docs highlight low latency, low cost, multimodal support, thinking budgets, and tool integrations such as grounding and code execution. It is best described for classification, translation, routing, extraction, and high-scale workloads.
Total Context
1Mtokens
Max Output
65.5Ktokens
Released
Jun 17, 2025
Modalities
Gemini 2.5 Flash-Lite Price
| Input Price | Output Price | Cache Read |
|---|---|---|
| $0.1/M | $0.4/M | $0.01/M |
Gemini 2.5 Flash-Lite API
Gemini 2.5 Flash-Lite Benchmark
11.4
/100
Artificial Analysis Intelligence Index
Artificial Analysis broad capability aggregate
Index score
9.5
/100
Artificial Analysis Coding Index
Artificial Analysis software task aggregate
Index score
53.3
/100
Artificial Analysis Math Index
Artificial Analysis math reasoning aggregate
Index score
Knowledge & Reasoning
MMLU-Pro
Advanced multi-task knowledge
75.9%
GPQA
Advanced science problem solving
62.5%
HLE
Broad expert-level exam set
6.4%
Coding & Engineering
LiveCodeBench
Live coding problems
59.3%
SciCode
Scientific coding challenges
19.3%
Terminal-Bench Hard
Hard terminal task execution
4.5%
Math
MATH-500
Advanced math problem solving
96.9%
AIME
Competition math problems
70.3%
AIME 2025
Competition math problems
53.3%
Instruction Following & Agent Tasks
IFBench
Prompt constraint adherence
49.9%
AA-LCR
Long-context reasoning
51.3%
τ²-Bench
Agent workflow tasks
18.4%
Metrics sourced from Artificial Analysis
Frequently asked questions about Gemini 2.5 Flash-Lite
Understand what Gemini 2.5 Flash-Lite is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.
What kind of model is Gemini 2.5 Flash-Lite?+
Gemini 2.5 Flash-Lite is Google’s most economical Gemini 2.5 model for simple, high-frequency multimodal processing. It remains a defined model generation, but newer models in the same family may be preferable for new evaluations.
What should teams use Gemini 2.5 Flash-Lite for?+
Best-fit scenarios include large-scale classification and routing, simple structured data extraction, and high-volume translation. Test representative inputs and define measurable acceptance criteria before production.
Where does Gemini 2.5 Flash-Lite have a clear technical advantage?+
Key strengths include cost-efficient scaling, fast response times, and support for varied multimodal inputs. This combination is especially useful for simple structured data extraction.
When should a team choose another model instead of Gemini 2.5 Flash-Lite?+
Consider another model when the workload involves difficult multi-step reasoning, the project can adopt a newer Gemini generation, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.
What should be checked before integrating Gemini 2.5 Flash-Lite with TokenHub?+
In TokenHub, select the exact model identifier displayed for Gemini 2.5 Flash-Lite, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm the TokenHub-exposed input types, tools, grounding options, and model lifecycle rather than assuming full Gemini API parity.
Media and Discussions
Selected public videos and posts related to this model.
X (Twitter)
Reddit
YouTube