Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite

Gemini 3.1 Flash Lite is the high-efficiency multimodal model in the Gemini 3.1 family. Model cards describe low latency, high-volume use, support for text, image, video, audio, and PDFs, and lightweight agent tasks. It is best framed for extraction, classification, routing, and production-scale multimodal workloads.

Total Context

1Mtokens

Max Output

65.5Ktokens

Released

May 7, 2026

Modalities

Gemini 3.1 Flash-Lite Price

Input PriceOutput PriceCache ReadCache Create 5m
$0.25/M$1.5/M$0.025/M$0.0833/M

Gemini 3.1 Flash-Lite API

openaiPOST /v1/chat/completions

Gemini 3.1 Flash-Lite Benchmark

Gemini 3.1 Flash-Lite

25

/100

Artificial Analysis Intelligence Index

Artificial Analysis broad capability aggregate

Index score

30.1

/100

Artificial Analysis Coding Index

Artificial Analysis software task aggregate

Index score

Knowledge & Reasoning

GPQA

Advanced science problem solving

82.2%

HLE

Broad expert-level exam set

16.2%

Coding & Engineering

SciCode

Scientific coding challenges

41.9%

Terminal-Bench Hard

Hard terminal task execution

24.2%

Instruction Following & Agent Tasks

IFBench

Prompt constraint adherence

77.2%

AA-LCR

Long-context reasoning

65.3%

τ²-Bench

Agent workflow tasks

31.3%

Metrics sourced from Artificial Analysis

Media and Discussions

Selected public videos and posts related to this model.

X (Twitter)

View post on X
View post on X
View post on X

Reddit

YouTube

Watch on YouTube
Watch on YouTube
Watch on YouTube

Frequently asked questions about Gemini 3.1 Flash-Lite

Understand what Gemini 3.1 Flash-Lite is, its best uses, distinguishing strengths, practical tradeoffs, and safe TokenHub integration guidance.

Where does Gemini 3.1 Flash-Lite sit within its provider’s model family?+

Gemini 3.1 Flash-Lite is Google’s low-latency, cost-efficient Gemini 3-series model for frequent lightweight multimodal tasks. It is a current public model in its provider’s documentation, though availability can vary by platform.

Which production scenarios suit Gemini 3.1 Flash-Lite?+

Best-fit scenarios include large-scale classification and routing, simple structured data extraction, and high-volume translation. Test representative inputs and define measurable acceptance criteria before production.

What makes Gemini 3.1 Flash-Lite stand out for simple structured data extraction?+

Key strengths include fast response times, cost-efficient scaling, and support for varied multimodal inputs. This combination is especially useful for simple structured data extraction.

What tradeoffs should developers consider with Gemini 3.1 Flash-Lite?+

Consider another model when the task needs the strongest Pro-tier reasoning, the workload requires nuanced long-form generation or difficult reasoning, or the workflow cannot include human review for important decisions. Verify important factual, legal, financial, medical, or operational outputs with qualified human review.

How can a team safely start using Gemini 3.1 Flash-Lite on TokenHub?+

In TokenHub, select the exact model identifier displayed for Gemini 3.1 Flash-Lite, use the endpoint documented for your account, and authenticate with your TokenHub credentials. Confirm the TokenHub-exposed input types, tools, grounding options, and model lifecycle rather than assuming full Gemini API parity.