POST /v1/chat/completionsGemini 3.1 Flash-Lite
gemini-3.1-flash-liteGemini 3.1 Flash Lite is the high-efficiency multimodal model in the Gemini 3.1 family. Model cards describe low latency, high-volume use, support for text, image, video, audio, and PDFs, and lightweight agent tasks. It is best framed for extraction, classification, routing, and production-scale multimodal workloads.
Total Context
1Mtokens
Max Output
65.5Ktokens
Released
May 7, 2026
Modalities
Gemini 3.1 Flash-Lite Price
| Input Price | Output Price | Cache Read | Cache Create 5m |
|---|---|---|---|
| $0.25/M | $1.5/M | $0.025/M | $0.0833/M |
Gemini 3.1 Flash-Lite API
Gemini 3.1 Flash-Lite Benchmark
Gemini 3.1 Flash-Lite
25
/100
Artificial Analysis Intelligence Index
Artificial Analysis broad capability aggregate
Index score
30.1
/100
Artificial Analysis Coding Index
Artificial Analysis software task aggregate
Index score
Knowledge & Reasoning
GPQA
Advanced science problem solving
82.2%
HLE
Broad expert-level exam set
16.2%
Coding & Engineering
SciCode
Scientific coding challenges
41.9%
Terminal-Bench Hard
Hard terminal task execution
24.2%
Instruction Following & Agent Tasks
IFBench
Prompt constraint adherence
77.2%
AA-LCR
Long-context reasoning
65.3%
τ²-Bench
Agent workflow tasks
31.3%
Metrics sourced from Artificial Analysis
Frequently asked questions about Gemini 3.1 Flash-Lite
Understand what Gemini 3.1 Flash-Lite is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.
Where does Gemini 3.1 Flash-Lite sit within its provider’s model family?+
Gemini 3.1 Flash-Lite is Google’s low-latency, cost-efficient Gemini 3-series model for frequent lightweight multimodal tasks. It is a current public model in its provider’s documentation, though availability can vary by platform.
Which production scenarios suit Gemini 3.1 Flash-Lite?+
Best-fit scenarios include large-scale classification and routing, simple structured data extraction, and high-volume translation. Test representative inputs and define measurable acceptance criteria before production.
What makes Gemini 3.1 Flash-Lite stand out for simple structured data extraction?+
Key strengths include fast response times, cost-efficient scaling, and support for varied multimodal inputs. This combination is especially useful for simple structured data extraction.
What tradeoffs should developers consider with Gemini 3.1 Flash-Lite?+
Consider another model when the task needs the strongest Pro-tier reasoning, the workload requires nuanced long-form generation or difficult reasoning, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.
How can a team safely start using Gemini 3.1 Flash-Lite on TokenHub?+
In TokenHub, select the exact model identifier displayed for Gemini 3.1 Flash-Lite, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm the TokenHub-exposed input types, tools, grounding options, and model lifecycle rather than assuming full Gemini API parity.
Media and Discussions
Selected public videos and posts related to this model.
X (Twitter)
Reddit
YouTube