POST /v1/chat/completionsDeepSeek V4 Flash
deepseek-v4-flashDeepSeek V4 Flash keeps the V4 family’s 1M-token context window but uses a lighter MoE configuration, commonly described as 284B total parameters with 13B activated parameters. The emphasis is throughput: fast inference, lower cost per call, and production workloads that still need long-context handling. It is the better fit when the task volume is high and the workload benefits from V4-style long-context architecture without always requiring the deepest reasoning tier.
Total Context
1Mtokens
Max Output
384Ktokens
Released
Apr 24, 2026
Modalities
DeepSeek V4 Flash Price
| Input Price | Output Price | Cache Read |
|---|---|---|
| $0.15/M | $0.3/M | $0.003/M |
DeepSeek V4 Flash API
DeepSeek V4 Flash Benchmark
40.3
/100
Artificial Analysis Intelligence Index
Artificial Analysis broad capability aggregate
Index score
38.7
/100
Artificial Analysis Coding Index
Artificial Analysis software task aggregate
Index score
Knowledge & Reasoning
GPQA
Advanced science problem solving
89.4%
HLE
Broad expert-level exam set
32.1%
Coding & Engineering
SciCode
Scientific coding challenges
44.9%
Terminal-Bench Hard
Hard terminal task execution
35.6%
Instruction Following & Agent Tasks
IFBench
Prompt constraint adherence
79.2%
AA-LCR
Long-context reasoning
63%
τ²-Bench
Agent workflow tasks
95.0%
Metrics sourced from Artificial Analysis
Model Comparison
DeepSeek V4 Flash Article
DeepSeek V4 Flash FAQ
DeepSeek V4 Flash: capabilities, use cases, limits, and TokenHub guidance.
How is DeepSeek V4 Flash positioned?+
DeepSeek V4 Flash is a DeepSeek model for fast, efficient reasoning and agent work.
Where does DeepSeek V4 Flash add value?+
Best for latency-sensitive applications, agent workflows and high-volume requests, especially when speed and cost efficiency is the priority.
What is DeepSeek V4 Flash's practical edge?+
Key strength: a smaller design with faster, more economical inference and switchable thinking and non-thinking modes.
Which constraint matters most?+
It has less headroom on the hardest reasoning and engineering tasks. For maximum answer quality, consider DeepSeek V4 Pro.
How do I integrate DeepSeek V4 Flash safely?+
Use the exact ID shown by TokenHub; follow your account docs and verify current features.
Media and Discussions
Selected public videos and posts related to this model.
X (Twitter)
Reddit
YouTube