Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite is Google’s fastest and most budget-friendly Gemini 2.5 option. Official docs highlight low latency, low cost, multimodal support, thinking budgets, and tool integrations such as grounding and code execution. It is best described for classification, translation, routing, extraction, and high-scale workloads.

Total Context

1Mtokens

Max Output

65.5Ktokens

Released

Jun 17, 2025

Modalities

Gemini 2.5 Flash-Lite Price

Input PriceOutput PriceCache Read
$0.1/M$0.4/M$0.01/M

Gemini 2.5 Flash-Lite API

POST /v1/chat/completions

Gemini 2.5 Flash-Lite Benchmark

11.4

/100

Artificial Analysis Intelligence Index

Artificial Analysis broad capability aggregate

Index score

9.5

/100

Artificial Analysis Coding Index

Artificial Analysis software task aggregate

Index score

53.3

/100

Artificial Analysis Math Index

Artificial Analysis math reasoning aggregate

Index score

Knowledge & Reasoning

MMLU-Pro

Advanced multi-task knowledge

75.9%

GPQA

Advanced science problem solving

62.5%

HLE

Broad expert-level exam set

6.4%

Coding & Engineering

LiveCodeBench

Live coding problems

59.3%

SciCode

Scientific coding challenges

19.3%

Terminal-Bench Hard

Hard terminal task execution

4.5%

Math

MATH-500

Advanced math problem solving

96.9%

AIME

Competition math problems

70.3%

AIME 2025

Competition math problems

53.3%

Instruction Following & Agent Tasks

IFBench

Prompt constraint adherence

49.9%

AA-LCR

Long-context reasoning

51.3%

τ²-Bench

Agent workflow tasks

18.4%

Metrics sourced from Artificial Analysis

Media and Discussions

Selected public videos and posts related to this model.

X (Twitter)

View post on X
View post on X
View post on X

Reddit

YouTube

Watch on YouTube
Watch on YouTube
Watch on YouTube

Frequently asked questions about Gemini 2.5 Flash-Lite

Understand what Gemini 2.5 Flash-Lite is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.

What kind of model is Gemini 2.5 Flash-Lite?+

Gemini 2.5 Flash-Lite is Google’s most economical Gemini 2.5 model for simple, high-frequency multimodal processing. It remains a defined model generation, but newer models in the same family may be preferable for new evaluations.

What should teams use Gemini 2.5 Flash-Lite for?+

Best-fit scenarios include large-scale classification and routing, simple structured data extraction, and high-volume translation. Test representative inputs and define measurable acceptance criteria before production.

Where does Gemini 2.5 Flash-Lite have a clear technical advantage?+

Key strengths include cost-efficient scaling, fast response times, and support for varied multimodal inputs. This combination is especially useful for simple structured data extraction.

When should a team choose another model instead of Gemini 2.5 Flash-Lite?+

Consider another model when the workload involves difficult multi-step reasoning, the project can adopt a newer Gemini generation, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.

What should be checked before integrating Gemini 2.5 Flash-Lite with TokenHub?+

In TokenHub, select the exact model identifier displayed for Gemini 2.5 Flash-Lite, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm the TokenHub-exposed input types, tools, grounding options, and model lifecycle rather than assuming full Gemini API parity.