Models

Explore AI model pricing, capabilities, endpoints, and vendor coverage from one production catalog.

OpenAI

GPT-5.5

gpt-5.5

GPT-5.5 is described by OpenAI as a smarter frontier model for coding, research, data analysis, and professional knowledge work. The official launch messaging emphasizes higher capability, better speed than previous difficult-work models, and strong performance on tasks involving documents and structured analysis. Its description should focus on broad professional intelligence rather than a single niche.

Total Context

1.1M

Max Output

128K

Released

Apr 23, 2026

Input$5 / M tokens
Output$30 / M tokens
Cache Read$0.5 / M tokens

DeepSeek

DeepSeek V4 Pro

deepseek-v4-pro

DeepSeek V4 Pro is described as a large-scale Mixture-of-Experts model with 1.6T total parameters and 49B activated parameters, while keeping a 1M-token context window for very large inputs. Its model cards emphasize advanced reasoning, coding, and long-horizon agent workflows rather than simple chat. The Pro variant is the capability-oriented member of the V4 family, making it better suited to full-codebase analysis, large research synthesis, and multi-step automation where depth matters more than the lowest possible latency.

Total Context

1M

Max Output

384K

Released

Apr 24, 2026

Input$1.8 / M tokens
Output$3.5 / M tokens
Cache Read$0.015 / M tokens

DeepSeek

DeepSeek V4 Flash

deepseek-v4-flash

DeepSeek V4 Flash keeps the V4 family’s 1M-token context window but uses a lighter MoE configuration, commonly described as 284B total parameters with 13B activated parameters. The emphasis is throughput: fast inference, lower cost per call, and production workloads that still need long-context handling. It is the better fit when the task volume is high and the workload benefits from V4-style long-context architecture without always requiring the deepest reasoning tier.

Total Context

1M

Max Output

384K

Released

Apr 24, 2026

Input$0.15 / M tokens
Output$0.3 / M tokens
Cache Read$0.003 / M tokens

Alibaba

Qwen3.7 Plus

qwen3.7-plus

Qwen3.7 Plus takes the Qwen3.7 agent-oriented design in a more cost-effective direction. Third-party model cards describe it as supporting text and image input, with stronger vision-language ability and hybrid agent capability for GUI, mobile navigation, and visual-reference tasks. The model is suitable when users need the new Qwen3.7 capability profile without always paying for the Max tier.

Total Context

1M

Max Output

64K

Released

Jun 2, 2026

Input$0.2857 / M tokens
Output$1.1429 / M tokens
Cache Read$0.0571 / M tokens

OpenAI

GPT-4.1

gpt-4.1

GPT-4.1 is an OpenAI model generation focused on improved coding, instruction following, and long-context performance. Official announcements present it as a stronger developer model than GPT-4o for many programming and instruction-heavy tasks. Its catalog description should highlight practical coding reliability and long-context understanding.

Total Context

1M

Max Output

32.8K

Released

Apr 14, 2025

Input$2 / M tokens
Output$8 / M tokens
Cache Read$0.5 / M tokens

OpenAI

GPT-4.1 Mini

gpt-4.1-mini

GPT-4.1 Mini brings the GPT-4.1 family’s coding and instruction-following improvements into a faster, lower-cost form. It is suitable for high-volume developer tools, structured generation, extraction, and product features that do not require the full model. The main distinction is production efficiency while retaining the 4.1 generation’s task discipline.

Total Context

1M

Max Output

32.8K

Released

Apr 14, 2025

Input$0.4 / M tokens
Output$1.6 / M tokens
Cache Read$0.1 / M tokens

OpenAI

GPT-4o

gpt-4o

GPT-4o is OpenAI’s multimodal flagship from the GPT-4o generation, built for text and image input with strong general intelligence. Official docs describe it as a versatile high-intelligence model suitable for a broad range of language and vision tasks. It remains useful where multimodal understanding and natural interaction matter more than the newest reasoning stack.

Total Context

128K

Max Output

16.4K

Released

May 13, 2024

Input$2.5 / M tokens
Output$10 / M tokens
Cache Read$1.25 / M tokens

OpenAI

GPT-4o Mini

gpt-4o-mini

GPT-4o Mini is the fast and affordable small model in the GPT-4o family. OpenAI docs position it for focused tasks with text and image input, structured outputs, fine-tuning, and distillation workflows. It is best introduced as a lightweight multimodal production model rather than a reduced copy of GPT-4o.

Total Context

128K

Max Output

16.4K

Released

Jul 18, 2024

Input$0.15 / M tokens
Output$0.6 / M tokens
Cache Read$0.075 / M tokens

OpenAI

GPT-5.3 Chat

gpt-5.3-chat

GPT-5.3 Chat is the API-facing name for GPT-5.3 Instant, the ChatGPT snapshot designed to make everyday conversations smoother and more directly useful. OpenAI describes the update as improving answer accuracy, web-search contextualization, and conversational flow by reducing unnecessary caveats, dead ends, and overly cautious phrasing. It supports text and image input with text output, but OpenAI’s API docs mark it as deprecated in favor of newer GPT models.

Total Context

128K

Max Output

16.4K

Released

Mar 3, 2026

Input$1.75 / M tokens
Output$14 / M tokens
Cache Read$0.175 / M tokens

OpenAI

GPT-5.3 Codex

gpt-5.3-codex

GPT-5.3-Codex is OpenAI’s agentic coding model for Codex and similar development environments. It combines frontier software engineering performance with broader reasoning and professional knowledge, supports configurable reasoning effort, and is listed with a 400K context window and 128K max output tokens. OpenAI positions it as moving Codex beyond writing and reviewing code toward using computers, running terminal workflows, iterating on web apps, and handling real-world engineering tasks over long horizons.

Total Context

400K

Max Output

128K

Released

Feb 5, 2026

Input$1.75 / M tokens
Output$14 / M tokens
Cache Read$0.175 / M tokens

OpenAI

GPT-5.4

gpt-5.4

GPT-5.4 is presented by OpenAI as a capable and efficient frontier model for professional work. Official materials emphasize coding, native computer use, spreadsheet/document/presentation workflows, improved factuality, and a large context window. It is well described as a practical work model that connects reasoning with real productivity tasks.

Total Context

1.1M

Max Output

128K

Released

Mar 5, 2026

Input$2.5 / M tokens
Output$15 / M tokens
Cache Read$0.25 / M tokens

OpenAI

GPT-5.4 Mini

gpt-5.4-mini

GPT-5.4 Mini is the smaller and faster member of the GPT-5.4 family. OpenAI documentation positions mini models for lower-latency workloads while retaining coding, tool use, multimodal reasoning, and strong instruction following. It is a good fit for well-scoped production tasks, subagents, and applications that need many fast calls.

Total Context

400K

Max Output

128K

Released

Mar 17, 2026

Input$0.75 / M tokens
Output$4.5 / M tokens
Cache Read$0.075 / M tokens

OpenAI

GPT-5.4 Nano

gpt-5.4-nano

GPT-5.4 Nano is the lowest-cost, smallest GPT-5.4 option for high-volume simple tasks. OpenAI’s model guidance positions nano models around latency and cost efficiency rather than maximum reasoning depth. It should be used in descriptions for classification, routing, extraction, lightweight generation, and other predictable workflows.

Total Context

400K

Max Output

128K

Released

Mar 17, 2026

Input$0.2 / M tokens
Output$1.25 / M tokens
Cache Read$0.02 / M tokens

OpenAI

GPT-5.4 Pro

gpt-5.4-pro

GPT-5.4 Pro is the more precise and higher-quality tier of GPT-5.4, built for demanding professional tasks. OpenAI’s model guidance separates Pro from the standard version by its stronger reasoning and accuracy expectations. In a catalog, it should be described as the choice for difficult analysis, complex code, and high-stakes knowledge work when extra depth is worth the cost.

Total Context

1.1M

Max Output

128K

Released

Mar 5, 2026

Input$30 / M tokens
Output$180 / M tokens

OpenAI

GPT-5.5 Pro

gpt-5.5-pro

GPT-5.5 Pro is the higher-compute variant of OpenAI’s GPT-5.5 generation, intended for the hardest professional tasks where answer quality matters more than speed. OpenAI materials describe the Pro tier as thinking harder for greater precision across coding, research, data analysis, and document-heavy work. It should be positioned for difficult reasoning and professional-grade deliverables rather than routine chat.

Total Context

1.1M

Max Output

128K

Released

Apr 23, 2026

Input$30 / M tokens
Output$180 / M tokens

Minimax

MiniMax M2.5

MiniMax-M2.5

MiniMax M2.5 is positioned as a real-world productivity model trained in complex digital environments. Official materials describe advances in coding, agentic tool use, search, and office work, extending earlier coding strengths into Word, Excel, and PowerPoint-style tasks. Its unique angle is not raw language fluency, but the ability to operate across messy practical workflows.

Total Context

204.8K

Max Output

131.1K

Released

Feb 12, 2026

Input$0.3 / M tokens
Output$1.2 / M tokens
Cache Read$0.03 / M tokens

Minimax

MiniMax M2.7

MiniMax-M2.7

MiniMax M2.7 is presented as a productivity and engineering model for autonomous workflows, multi-agent collaboration, live debugging, and document-heavy work. Public descriptions mention root-cause analysis, financial modeling, and full Word/Excel/PowerPoint-style document generation. It should be described as an applied work model, not just a chat or writing model.

Total Context

204.8K

Max Output

131.1K

Released

Mar 18, 2026

Input$0.3 / M tokens
Output$1.2 / M tokens
Cache Read$0.06 / M tokens

Minimax

MiniMax M3

MiniMax-M3

MiniMax M3 is described as a frontier multimodal foundation model with a 1M-token context window, built for long-horizon agentic work, coding, and tool use. Model cards highlight MiniMax Sparse Attention and much lower long-context cost compared with earlier generations. The model is best introduced as a production-oriented multimodal agent model for large context, software tasks, and collaborative workflows.

Total Context

512K

Max Output

128K

Released

Jun 1, 2026

Input$0.6 / M tokens
Output$2.4 / M tokens
Cache Read$0.12 / M tokens

Anthropic

Claude Fable 5

claude-fable-5

Claude Fable 5 is presented as a Mythos-level Claude model for ambitious, long-running projects. Source pages emphasize autonomous knowledge work, software engineering, vision, memory, and the ability to work for extended periods with sub-agents. Its description should feel more like a project-level collaborator than a short-turn assistant.

Total Context

1M

Max Output

128K

Released

Jun 9, 2026

Input$10 / M tokens
Output$50 / M tokens
Cache Read$1 / M tokens

Anthropic

Claude Haiku 4.5

claude-haiku-4.5

Claude Haiku 4.5 is Anthropic’s fast and cost-efficient model with surprisingly strong coding, computer-use, and agent-task performance. Official materials compare parts of its behavior to earlier Sonnet-level capability while emphasizing speed and price. It should be described as a compact production model for responsive agentic applications.

Total Context

200K

Max Output

64K

Released

Oct 15, 2025

Input$1 / M tokens
Output$5 / M tokens
Cache Read$0.1 / M tokens

Anthropic

Claude Opus 4.5

claude-opus-4.5

Claude Opus 4.5 belongs to the Opus 4 generation of high-capability Claude models, with model cards emphasizing difficult reasoning, coding, and agentic work. It sits below newer Opus releases but still represents the premium capability tier of its generation. The description should avoid presenting it as a generic chat model and instead stress deep work and reliability.

Total Context

200K

Max Output

64K

Released

Nov 24, 2025

Input$5 / M tokens
Output$25 / M tokens
Cache Read$0.5 / M tokens

Popular model recommendations

Start with high-signal models from the live catalog, then open a detail page to compare context, endpoints, and effective pricing.

OpenAI

GPT-5.5

GPT-5.5 is described by OpenAI as a smarter frontier model for coding, research, data analysis, and professional knowledge work. The official launch messaging emphasizes higher capability, better speed than previous difficult-work models, and strong performance on tasks involving documents and structured analysis. Its description should focus on broad professional intelligence rather than a single niche.

Total Context

1.1M

Input Price

$5 / M tokens

View model

DeepSeek

DeepSeek V4 Pro

DeepSeek V4 Pro is described as a large-scale Mixture-of-Experts model with 1.6T total parameters and 49B activated parameters, while keeping a 1M-token context window for very large inputs. Its model cards emphasize advanced reasoning, coding, and long-horizon agent workflows rather than simple chat. The Pro variant is the capability-oriented member of the V4 family, making it better suited to full-codebase analysis, large research synthesis, and multi-step automation where depth matters more than the lowest possible latency.

Total Context

1M

Input Price

$1.8 / M tokens

View model

DeepSeek

DeepSeek V4 Flash

DeepSeek V4 Flash keeps the V4 family’s 1M-token context window but uses a lighter MoE configuration, commonly described as 284B total parameters with 13B activated parameters. The emphasis is throughput: fast inference, lower cost per call, and production workloads that still need long-context handling. It is the better fit when the task volume is high and the workload benefits from V4-style long-context architecture without always requiring the deepest reasoning tier.

Total Context

1M

Input Price

$0.15 / M tokens

View model

Alibaba

Qwen3.7 Plus

Qwen3.7 Plus takes the Qwen3.7 agent-oriented design in a more cost-effective direction. Third-party model cards describe it as supporting text and image input, with stronger vision-language ability and hybrid agent capability for GUI, mobile navigation, and visual-reference tasks. The model is suitable when users need the new Qwen3.7 capability profile without always paying for the Max tier.

Total Context

1M

Input Price

$0.2857 / M tokens

View model

OpenAI

GPT-4.1

GPT-4.1 is an OpenAI model generation focused on improved coding, instruction following, and long-context performance. Official announcements present it as a stronger developer model than GPT-4o for many programming and instruction-heavy tasks. Its catalog description should highlight practical coding reliability and long-context understanding.

Total Context

1M

Input Price

$2 / M tokens

View model

OpenAI

GPT-4.1 Mini

GPT-4.1 Mini brings the GPT-4.1 family’s coding and instruction-following improvements into a faster, lower-cost form. It is suitable for high-volume developer tools, structured generation, extraction, and product features that do not require the full model. The main distinction is production efficiency while retaining the 4.1 generation’s task discipline.

Total Context

1M

Input Price

$0.4 / M tokens

View model

Model Comparison

Quick comparison against selected catalog neighbors.

Model catalog FAQ

A quick guide for choosing, comparing, and using models from the TokenHub catalog.

How should I choose a model from this list?

+

Start with your workload. Use the filters to narrow by provider, tags, endpoint type, and billing group, then compare context size, output limit, modalities, and input or output pricing.

What does effective price mean?

+

Effective price applies the active billing group ratio to the model pricing data. It helps you estimate the real input, output, or per-request cost for the group you are using.

Can I use these models through API endpoints?

+

Yes. Open a model detail page to see the supported endpoint types and documentation links. Availability can differ by model, provider, and current routing configuration.

Why do context window and max output matter?

+

The context window controls how much prompt and conversation history a model can read. Max output controls how much text it can generate in one response, which matters for long-form writing, coding, and document tasks.