Compare AI Model Pricing API Costs
Understanding the cost of generative AI models is critical for production workloads. This tool tracks real-time pricing for the leading API providers, comparing cost per million tokens (input/output), context lengths, and model specialties.
Aion-1.0-Mini (AionLabs)
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.7/M input, $1.4/M output
- Release Date: Feb 5, 2025
Aion-RP 1.0 (8B) (AionLabs)
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.
- Type: Text
- Context Window: 33,000 tokens
- Cost: $0.8/M input, $1.6/M output
- Release Date: Feb 5, 2025
Aion-1.0 (AionLabs)
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $4/M input, $8/M output
- Release Date: Feb 5, 2025
Goliath 120B (alpindale)
A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.
- Type: Text
- Context Window: 6,000 tokens
- Cost: $3.75/M input, $7.5/M output
- Release Date: Nov 10, 2023
Anthropic Claude Haiku Latest (Anthropic)
This model always redirects to the latest model in the Anthropic Claude Haiku family.
- Type: Text
- Context Window: 200,000 tokens
- Cost: $1/M input, $5/M output
- Release Date: 2025
Anthropic Claude Sonnet Latest (Anthropic)
This model always redirects to the latest model in the Anthropic Claude Sonnet family.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $3/M input, $15/M output
- Release Date: 2025
Claude Opus 4.6 (Anthropic)
Opus 4.6 is Anthropic's strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $5/M input, $25/M output
- Release Date: Feb 4, 2026
Claude Opus 4.6 (Fast) (Anthropic)
Fast-mode variant of Opus 4.6 - identical capabilities with higher output speed at premium 6x pricing.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $30/M input, $150/M output
- Release Date: 2025
Claude Opus 4.7 (Anthropic)
Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on complex, multi-step tasks and more reliable agentic execution across extended workflows.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $5/M input, $25/M output
- Release Date: Apr 16, 2026
Claude Opus Latest (Anthropic)
This model always redirects to the latest model in the Claude Opus family.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $5/M input, $25/M output
- Release Date: 2025
Claude Sonnet 4.6 (Anthropic)
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $3/M input, $15/M output
- Release Date: Feb 17, 2026
Claude Sonnet 4.5 (Anthropic)
Claude Sonnet 4.5 is Anthropic's most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $3/M input, $15/M output
- Release Date: Sep 29, 2025
Claude Haiku 4.5 (Anthropic)
Claude Haiku 4.5 is Anthropic's fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4's performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications.
- Type: Text
- Context Window: 200,000 tokens
- Cost: $1/M input, $5/M output
- Release Date: Oct 15, 2025
CoBuddy (free) (Baidu)
CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency.
- Type: Text
- Context Window: 131,072 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
Qianfan-OCR-Fast (free) (Baidu)
Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence.
- Type: Text
- Context Window: 65,536 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
ERNIE 4.5 VL 424B A47B (Baidu)
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu's ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It supports both thinking and non-thinking inference modes.
- Type: Text
- Context Window: 123,000 tokens
- Cost: $0.42/M input, $1.25/M output
- Release Date: Jun 30, 2025
Amazon Nova Premier 1.0 (Amazon)
Amazon Nova Premier is the most capable of Amazon's multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $2.5/M input, $12.5/M output
- Release Date: Nov 1, 2025
FLUX.2 Flex (Black Forest Labs)
FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing in the same unified architecture.
- Type: Image
- Context Window: 67,000 tokens
- Cost: $0.06 per megapixel
- Release Date: Nov 25, 2025
Recraft V3 (Recraft)
Recraft V3 is an image generation model from Recraft. It supports text and image inputs with image output at ~1K resolution across multiple aspect ratios.
- Type: Image
- Context Window: 66,000 tokens
- Cost: $0.04 per image
- Release Date: May 8, 2026
DeepSeek V3.2 (DeepSeek)
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA) for efficient long-context processing.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.252/M input, $0.378/M output
- Release Date: Dec 1, 2025
DeepSeek V4 Flash (DeepSeek)
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.14/M input, $0.28/M output
- Release Date: Apr 24, 2026
DeepSeek V4 Pro (DeepSeek)
DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning and coding.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.435/M input, $0.87/M output
- Release Date: Apr 24, 2026
Gemini 2.0 Flash (Google)
Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $0.1/M input, $0.4/M output
- Release Date: Feb 5, 2025
Gemini 2.5 Flash (Google)
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in thinking capabilities.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.3/M input, $2.5/M output
- Release Date: Jun 17, 2025
Gemini 2.5 Flash Lite (Google)
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput and faster token generation.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.1/M input, $0.4/M output
- Release Date: Jul 22, 2025
Gemini 2.5 Pro (Google)
Gemini 2.5 Pro is Google's state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs thinking capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $1.25/M input, $10/M output
- Release Date: Jun 17, 2025
Gemini 3 Flash Preview (Google)
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.5/M input, $3/M output
- Release Date: Dec 17, 2025
Gemini 3.1 Pro Preview (Google)
Gemini 3.1 Pro Preview is Google's frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $2/M input, $12/M output
- Release Date: Feb 19, 2026
Gemini 3.1 Flash Lite (Google)
Gemini 3.1 Flash Lite is Google's GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs.
- Type: Text
- Context Window: 1,048,576 tokens
- Cost: $0.25/M input, $1.5/M output
- Release Date: 2025
Gemini 3.1 Flash Lite Preview (Google)
Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.25/M input, $1.5/M output
- Release Date: Mar 3, 2026
Google Gemini Flash Latest (Google)
This model always redirects to the latest model in the Google Gemini Flash family.
- Type: Text
- Context Window: 1,048,576 tokens
- Cost: $0.5/M input, $3/M output
- Release Date: 2025
Google Gemini Pro Latest (Google)
This model always redirects to the latest model in the Google Gemini Pro family.
- Type: Text
- Context Window: 1,048,576 tokens
- Cost: $2/M input, $12/M output
- Release Date: 2025
Gemma 4 31B (Google)
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.13/M input, $0.38/M output
- Release Date: Apr 2, 2026
Gemma 4 26B A4B (Google)
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.06/M input, $0.33/M output
- Release Date: Apr 3, 2026
Lyria 3 Pro Preview (Google)
Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images.
- Type: Audio
- Context Window: 1,050,000 tokens
- Cost: $0.08 per song
- Release Date: Mar 31, 2026
Granite 4.1 8B (IBM)
Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window.
- Type: Text
- Context Window: 131,072 tokens
- Cost: $0.05/M input, $0.1/M output
- Release Date: 2025
Ling-2.6-1T (InclusionAI)
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company's trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0.3/M input, $2.5/M output
- Release Date: 2025
Ling-2.6-flash (InclusionAI)
Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0.08/M input, $0.24/M output
- Release Date: 2025
Ring-2.6-1T (free) (InclusionAI)
Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
Weaver (alpha) (Mancer)
An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
- Type: Text
- Context Window: 8,000 tokens
- Cost: $0.75/M input, $1/M output
- Release Date: Aug 2, 2023
MiniMax M2.7 (MiniMax)
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. It integrates advanced agentic capabilities through multi-agent collaboration.
- Type: Text
- Context Window: 197,000 tokens
- Cost: $0.299/M input, $1.2/M output
- Release Date: Mar 18, 2026
MiniMax M2.5 (MiniMax)
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work.
- Type: Text
- Context Window: 197,000 tokens
- Cost: $0.15/M input, $1.15/M output
- Release Date: Feb 12, 2026
Mistral Medium 3.5 (Mistral AI)
Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $1.5/M input, $7.5/M output
- Release Date: 2025
Mistral Nemo (Mistral AI)
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual and supports function calling.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.02/M input, $0.03/M output
- Release Date: Jul 19, 2024
Mistral Saba (Mistral AI)
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance.
- Type: Text
- Context Window: 33,000 tokens
- Cost: $0.2/M input, $0.6/M output
- Release Date: Feb 17, 2025
Kimi K2.6 (MoonshotAI)
Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.75/M input, $3.5/M output
- Release Date: Apr 20, 2026
Kimi K2.5 (MoonshotAI)
Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.44/M input, $2/M output
- Release Date: Jan 27, 2026
MoonshotAI Kimi Latest (MoonshotAI)
This model always redirects to the latest model in the MoonshotAI Kimi family.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0.75/M input, $3.5/M output
- Release Date: 2025
Nemotron 3 Nano 30B A3B (NVIDIA)
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.
- Type: Text
- Context Window: 256,000 tokens
- Cost: $0/M input, $0/M output
- Release Date: Dec 14, 2025
Nemotron 3 Nano Omni (free) (NVIDIA)
NVIDIA Nemotron 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems.
- Type: Text
- Context Window: 256,000 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
Nemotron 3 Super (NVIDIA)
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0/M input, $0/M output
- Release Date: Mar 11, 2026
Hunyuan A13B Instruct (Tencent)
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.14/M input, $0.57/M output
- Release Date: Jul 8, 2025
GPT Audio (OpenAI)
The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency.
- Type: Audio
- Context Window: 128,000 tokens
- Cost: $2.5/M input, $10/M output
- Release Date: Jan 20, 2026
GPT Chat Latest (OpenAI)
GPT Chat Latest points to OpenAI's stable API alias chat-latest that always resolves to the latest Instant chat model used in ChatGPT.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $5/M input, $30/M output
- Release Date: 2025
GPT-3.5 Turbo 16k (OpenAI)
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost.
- Type: Text
- Context Window: 16,000 tokens
- Cost: $3/M input, $4/M output
- Release Date: Aug 28, 2023
GPT-3.5 Turbo Instruct (OpenAI)
This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations.
- Type: Text
- Context Window: 4,000 tokens
- Cost: $1.5/M input, $2/M output
- Release Date: Sep 28, 2023
GPT-4o Mini (OpenAI)
GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable.
- Type: Text
- Context Window: 128,000 tokens
- Cost: $0.15/M input, $0.6/M output
- Release Date: Jul 18, 2024
GPT-4o Mini TTS (OpenAI)
GPT-4o Mini TTS is OpenAI's cost-efficient text-to-speech model. It converts text input into natural-sounding audio output, supporting a variety of voices and tones.
- Type: Audio
- Context Window: 4,000 tokens
- Cost: $0.6 per 1M characters
- Release Date: Apr 19, 2026
GPT-4o Mini Transcribe (OpenAI)
GPT-4o Mini Transcribe is OpenAI's smaller, cost-efficient speech-to-text model built on GPT-4o Mini audio capabilities.
- Type: Text
- Context Window: 128,000 tokens
- Cost: $1.25/M input, $5/M output
- Release Date: May 1, 2026
GPT-5.4 Image 2 (OpenAI)
GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2.
- Type: Text
- Context Window: 272,000 tokens
- Cost: $8/M input, $15/M output
- Release Date: 2025
GPT-5.4 (OpenAI)
GPT-5.4 is OpenAI's latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $2.5/M input, $15/M output
- Release Date: Mar 5, 2026
GPT-5.4 Mini (OpenAI)
GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $0.75/M input, $4.5/M output
- Release Date: Mar 17, 2026
GPT-5.4 Nano (OpenAI)
GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $0.2/M input, $1.25/M output
- Release Date: Mar 17, 2026
GPT-5.5 (OpenAI)
GPT-5.5 is OpenAI's frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $5/M input, $30/M output
- Release Date: 2025
GPT-5.5 Pro (OpenAI)
GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $30/M input, $180/M output
- Release Date: 2025
GPT-5 Mini (OpenAI)
GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $0.25/M input, $2/M output
- Release Date: Aug 7, 2025
GPT-5 Nano (OpenAI)
GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $0.05/M input, $0.4/M output
- Release Date: Aug 7, 2025
GPT-5.3 Codex (OpenAI)
GPT-5.3-Codex is OpenAI's most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with broader reasoning capabilities.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $1.75/M input, $14/M output
- Release Date: Feb 24, 2026
GPT-4.1 Mini (OpenAI)
GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $0.4/M input, $1.6/M output
- Release Date: Apr 14, 2025
gpt-oss-120b (OpenAI)
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.039/M input, $0.18/M output
- Release Date: Aug 5, 2025
gpt-oss-20b (OpenAI)
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.03/M input, $0.14/M output
- Release Date: Aug 5, 2025
o1 (OpenAI)
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
- Type: Text
- Context Window: 200,000 tokens
- Cost: $15/M input, $60/M output
- Release Date: Dec 17, 2024
o1 Pro (OpenAI)
The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.
- Type: Text
- Context Window: 200,000 tokens
- Cost: $150/M input, $600/M output
- Release Date: Mar 20, 2025
o3 Pro (OpenAI)
The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
- Type: Text
- Context Window: 200,000 tokens
- Cost: $20/M input, $80/M output
- Release Date: Jun 11, 2025
OpenAI GPT Latest (OpenAI)
This model always redirects to the latest model in the OpenAI GPT family.
- Type: Text
- Context Window: 1,050,000 tokens
- Cost: $5/M input, $30/M output
- Release Date: 2025
OpenAI GPT Mini Latest (OpenAI)
This model always redirects to the latest model in the OpenAI GPT Mini family.
- Type: Text
- Context Window: 400,000 tokens
- Cost: $0.75/M input, $4.5/M output
- Release Date: 2025
Owl Alpha (OpenRouter)
Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation.
- Type: Text
- Context Window: 1,048,756 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
Inflection 3 Productivity (Inflection)
Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines.
- Type: Text
- Context Window: 8,000 tokens
- Cost: $2.5/M input, $10/M output
- Release Date: Oct 11, 2024
Laguna M.1 (free) (Poolside)
Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engineering tasks.
- Type: Text
- Context Window: 131,072 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
Laguna XS.2 (free) (Poolside)
Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient coding agent series.
- Type: Text
- Context Window: 131,072 tokens
- Cost: $0/M input, $0/M output
- Release Date: 2025
Qwen Plus 0728 (Qwen)
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $0.26/M input, $0.78/M output
- Release Date: Sep 8, 2025
Qwen3.5 Plus 2026-04-20 (Qwen)
Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $0.4/M input, $2.4/M output
- Release Date: 2025
Qwen3.6 27B (Qwen)
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0.32/M input, $3.2/M output
- Release Date: 2025
Qwen3.6 35B A3B (Qwen)
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0.15/M input, $1/M output
- Release Date: 2025
Qwen3.6 Flash (Qwen)
Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $0.25/M input, $1.5/M output
- Release Date: 2025
Qwen3.6 Max Preview (Qwen)
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $1.04/M input, $6.24/M output
- Release Date: 2025
Qwen3.6 Plus (Qwen)
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $0.325/M input, $1.95/M output
- Release Date: Apr 2, 2026
Qwen3.5 Flash (Qwen)
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $0.065/M input, $0.26/M output
- Release Date: Feb 26, 2026
Qwen3 235B A22B Instruct 2507 (Qwen)
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.071/M input, $0.1/M output
- Release Date: Jul 21, 2025
Qwen3.5 397B A17B (Qwen)
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.39/M input, $2.34/M output
- Release Date: Feb 16, 2026
Qwen3 Embedding 8B (Qwen)
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.
- Type: Embeddings
- Context Window: 32,000 tokens
- Cost: $0.01/M input, $0/M output
- Release Date: Oct 29, 2025
Recraft V4 (Recraft)
Recraft V4 is an image generation model from Recraft. It supports text and image inputs with image output at ~1K resolution across multiple aspect ratios.
- Type: Image
- Context Window: 66,000 tokens
- Cost: $0.04 per image
- Release Date: May 8, 2026
Recraft V4 Pro (Recraft)
Recraft V4 Pro is an image generation model from Recraft. It supports text and image inputs with image output at ~2K resolution.
- Type: Image
- Context Window: 66,000 tokens
- Cost: $0.25 per image
- Release Date: May 8, 2026
Relace Search (Relace)
The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase and return relevant files to the user request.
- Type: Text
- Context Window: 256,000 tokens
- Cost: $1/M input, $3/M output
- Release Date: Dec 8, 2025
Riverflow V2 Max Preview (Sourceful)
Riverflow V2 Max Preview is the most powerful variant of Sourceful's Riverflow V2 preview lineup.
- Type: Image
- Context Window: 8,000 tokens
- Cost: $0.075 per image
- Release Date: Dec 9, 2025
Riverflow V2 Pro (Sourceful)
Riverflow V2 Pro is the most powerful variant of Sourceful's Riverflow 2.0 lineup, best for top-tier control and perfect text rendering.
- Type: Image
- Context Window: 8,000 tokens
- Cost: $0.15 per image
- Release Date: Feb 2, 2026
Riverflow V2 Fast Preview (Sourceful)
Riverflow V2 Fast Preview is the fastest variant of Sourceful's Riverflow V2 preview lineup.
- Type: Image
- Context Window: 8,000 tokens
- Cost: $0.03 per image
- Release Date: Dec 9, 2025
Riverflow V2 Standard Preview (Sourceful)
Riverflow V2 Standard Preview is the standard variant of Sourceful's Riverflow V2 preview lineup.
- Type: Image
- Context Window: 8,000 tokens
- Cost: $0.035 per image
- Release Date: Dec 9, 2025
Step 3.5 Flash (StepFun)
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.1/M input, $0.3/M output
- Release Date: Jan 30, 2026
Hy3 preview (Tencent)
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use.
- Type: Text
- Context Window: 262,144 tokens
- Cost: $0.066/M input, $0.26/M output
- Release Date: 2025
Switchpoint Router (Switchpoint)
Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.85/M input, $3.4/M output
- Release Date: Jul 12, 2025
Grok 4.1 Fast (xAI)
Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research.
- Type: Text
- Context Window: 2,000,000 tokens
- Cost: $0.2/M input, $0.5/M output
- Release Date: Nov 20, 2025
Grok 4 Fast (xAI)
Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.
- Type: Text
- Context Window: 2,000,000 tokens
- Cost: $0.2/M input, $0.5/M output
- Release Date: Sep 19, 2025
Grok 4.3 (xAI)
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output.
- Type: Text
- Context Window: 1,000,000 tokens
- Cost: $1.25/M input, $2.5/M output
- Release Date: 2025
Grok 3 Mini Beta (xAI)
Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.3/M input, $0.5/M output
- Release Date: Apr 10, 2025
Grok 3 Beta (xAI)
Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $3/M input, $15/M output
- Release Date: Apr 10, 2025
MiMo-V2.5 (Xiaomi)
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost.
- Type: Text
- Context Window: 1,048,576 tokens
- Cost: $0.4/M input, $2/M output
- Release Date: 2025
MiMo-V2.5-Pro (Xiaomi)
MiMo-V2.5-Pro is Xiaomi's flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks.
- Type: Text
- Context Window: 1,048,576 tokens
- Cost: $1/M input, $3/M output
- Release Date: 2025
MiMo-V2-Flash (Xiaomi)
MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters.
- Type: Text
- Context Window: 262,000 tokens
- Cost: $0.1/M input, $0.3/M output
- Release Date: Dec 14, 2025
GLM 4.5 Air (Z.ai)
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.13/M input, $0.85/M output
- Release Date: Jul 26, 2025
GLM 5.1 (Z.ai)
GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks.
- Type: Text
- Context Window: 203,000 tokens
- Cost: $1.05/M input, $3.5/M output
- Release Date: Apr 7, 2026
GLM 5 (Z.ai)
GLM-5 is Z.ai's flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows.
- Type: Text
- Context Window: 203,000 tokens
- Cost: $0.6/M input, $1.92/M output
- Release Date: Feb 11, 2026
GLM 5 Turbo (Z.ai)
GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments.
- Type: Text
- Context Window: 203,000 tokens
- Cost: $1.2/M input, $4/M output
- Release Date: Mar 15, 2026
GLM 4.7 (Z.ai)
GLM-4.7 is Z.ai's latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution.
- Type: Text
- Context Window: 203,000 tokens
- Cost: $0.4/M input, $1.75/M output
- Release Date: Dec 22, 2025
all-mpnet-base-v2 (Sentence Transformers)
The all-mpnet-base-v2 embedding model encodes sentences and short paragraphs into a 768-dimensional dense vector space.
- Type: Embeddings
- Context Window: 512 tokens
- Cost: $0.005/M input, $0/M output
- Release Date: Nov 18, 2025
all-MiniLM-L12-v2 (Sentence Transformers)
The all-MiniLM-L12-v2 embedding model maps sentences and short paragraphs into a 384-dimensional dense vector space.
- Type: Embeddings
- Context Window: 512 tokens
- Cost: $0.005/M input, $0/M output
- Release Date: Nov 18, 2025
paraphrase-MiniLM-L6-v2 (Sentence Transformers)
The paraphrase-MiniLM-L6-v2 embedding model converts sentences and short paragraphs into a 384-dimensional dense vector space.
- Type: Embeddings
- Context Window: 512 tokens
- Cost: $0.005/M input, $0/M output
- Release Date: Nov 18, 2025
multi-qa-mpnet-base-dot-v1 (Sentence Transformers)
The multi-qa-mpnet-base-dot-v1 embedding model transforms sentences and short paragraphs into a 768-dimensional dense vector space.
- Type: Embeddings
- Context Window: 512 tokens
- Cost: $0.005/M input, $0/M output
- Release Date: Nov 18, 2025
E5-Base-v2 (Intfloat)
The e5-base-v2 embedding model encodes English sentences and paragraphs into a 768-dimensional dense vector space.
- Type: Embeddings
- Context Window: 512 tokens
- Cost: $0.005/M input, $0/M output
- Release Date: Nov 18, 2025
Voxtral Mini TTS (Mistral AI)
Voxtral Mini TTS is Mistral's text-to-speech model featuring zero-shot voice cloning and multilingual support.
- Type: Speech
- Context Window: 4,000 tokens
- Cost: $16 per 1M characters
- Release Date: Apr 19, 2026
Orpheus 3B (Canopy Labs)
Orpheus 3B is an English text-to-speech model from Canopy Labs, fine-tuned for natural prosody and expressive delivery.
- Type: Audio
- Context Window: 4,000 tokens
- Cost: $7 per 1M characters
- Release Date: Apr 24, 2026
CSM 1B (Sesame)
CSM 1B is a conversational speech model from Sesame. It accepts text input and produces English speech output.
- Type: Audio
- Context Window: 4,000 tokens
- Cost: $7 per 1M characters
- Release Date: Apr 24, 2026
Zonos v0.1 Hybrid (Zyphra)
Zonos v0.1 Hybrid is a text-to-speech model from Zyphra built on a hybrid architecture. It produces English speech output.
- Type: Audio
- Context Window: 4,000 tokens
- Cost: $7 per 1M characters
- Release Date: Apr 24, 2026
Zonos v0.1 Transformer (Zyphra)
Zonos v0.1 Transformer is a text-to-speech model from Zyphra built on a pure transformer architecture.
- Type: Audio
- Context Window: 4,000 tokens
- Cost: $7 per 1M characters
- Release Date: Apr 24, 2026
Llama 3 Euryale 70B v2.1 (Sao10k)
Euryale 70B v2.1 is a model focused on creative roleplay. Better prompt adherence, better anatomy and spatial awareness.
- Type: Text
- Context Window: 8,000 tokens
- Cost: $1.48/M input, $1.48/M output
- Release Date: Jun 18, 2024
CodeLLaMa 7B Instruct Solidity (AlfredPros)
A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract.
- Type: Text
- Context Window: 4,000 tokens
- Cost: $0.8/M input, $1.2/M output
- Release Date: Apr 14, 2025
Hermes 4 405B (Nous AI)
Hermes 4 405B is Nous AI's flagship reasoning model with 405B parameters and hybrid reasoning support. Supports instant, hybrid, and deep reasoning modes.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $1/M input, $3/M output
- Release Date: 2025
Hermes 4 70B (Nous AI)
Hermes 4 70B offers the best cost/performance ratio in the Hermes family. Supports hybrid reasoning with variable output token counts.
- Type: Text
- Context Window: 131,000 tokens
- Cost: $0.13/M input, $0.4/M output
- Release Date: 2025
Hermes 3 Llama 3.1 70B (Nous AI)
Hermes 3 based on Llama 3.1 70B. Generalist model optimized for agentic workflows and OpenAI-compatible endpoints.
- Type: Text
- Context Window: 65,000 tokens
- Cost: $0.3/M input, $0.3/M output
- Release Date: 2025
Hermes 4.3 36B (Nous AI)
Hermes 4.3 36B is a fast, lightweight model for simple tasks. Best for high-volume, low-latency use cases.
- Type: Text
- Context Window: 32,000 tokens
- Cost: $0.1/M input, $0.39/M output
- Release Date: 2025
Frequently Asked Questions
How often is the AI pricing data updated?
Our data is updated regularly to reflect the latest pricing changes across all major providers including OpenAI, Anthropic, Google, DeepSeek, and specialized inference hosts.
What is the difference between input and output token costs?
Input tokens (your prompt) are typically 3-5x cheaper than output tokens (the model's response). When calculating costs for RAG systems, input tokens usually dominate the budget.
What is the context window and cost of Aion-1.0-Mini?
Aion-1.0-Mini has a context window of 131,000 tokens. The pricing is $0.7 per 1M input tokens and $1.4 per 1M output tokens.
What is the context window and cost of Aion-RP 1.0 (8B)?
Aion-RP 1.0 (8B) has a context window of 33,000 tokens. The pricing is $0.8 per 1M input tokens and $1.6 per 1M output tokens.
What is the context window and cost of Aion-1.0?
Aion-1.0 has a context window of 131,000 tokens. The pricing is $4 per 1M input tokens and $8 per 1M output tokens.
What is the context window and cost of Goliath 120B?
Goliath 120B has a context window of 6,000 tokens. The pricing is $3.75 per 1M input tokens and $7.5 per 1M output tokens.
What is the context window and cost of Anthropic Claude Haiku Latest?
Anthropic Claude Haiku Latest has a context window of 200,000 tokens. The pricing is $1 per 1M input tokens and $5 per 1M output tokens.