AI Model Pricing Calculator — Compare GPT, Claude, Gemini & Open Weights

Compare AI Model Pricing API Costs

Understanding the cost of generative AI models is critical for production workloads. This tool tracks real-time pricing for the leading API providers, comparing cost per million tokens (input/output), context lengths, and model specialties.

Aion-1.0-Mini (AionLabs)

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B.

Type: Text
Context Window: 131,000 tokens
Cost: $0.7/M input, $1.4/M output
Release Date: Feb 5, 2025

Aion-RP 1.0 (8B) (AionLabs)

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.

Type: Text
Context Window: 33,000 tokens
Cost: $0.8/M input, $1.6/M output
Release Date: Feb 5, 2025

Aion-1.0 (AionLabs)

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.

Type: Text
Context Window: 131,000 tokens
Cost: $4/M input, $8/M output
Release Date: Feb 5, 2025

Goliath 120B (alpindale)

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.

Type: Text
Context Window: 6,000 tokens
Cost: $3.75/M input, $7.5/M output
Release Date: Nov 10, 2023

Anthropic Claude Haiku Latest (Anthropic)

This model always redirects to the latest model in the Anthropic Claude Haiku family.

Type: Text
Context Window: 200,000 tokens
Cost: $1/M input, $5/M output
Release Date: 2025

Anthropic Claude Sonnet Latest (Anthropic)

This model always redirects to the latest model in the Anthropic Claude Sonnet family.

Type: Text
Context Window: 1,000,000 tokens
Cost: $3/M input, $15/M output
Release Date: 2025

Claude Opus 4.6 (Anthropic)

Opus 4.6 is Anthropic's strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts.

Type: Text
Context Window: 1,000,000 tokens
Cost: $5/M input, $25/M output
Release Date: Feb 4, 2026

Claude Opus 4.6 (Fast) (Anthropic)

Fast-mode variant of Opus 4.6 - identical capabilities with higher output speed at premium 6x pricing.

Type: Text
Context Window: 1,000,000 tokens
Cost: $30/M input, $150/M output
Release Date: 2025

Claude Opus 4.7 (Anthropic)

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on complex, multi-step tasks and more reliable agentic execution across extended workflows.

Type: Text
Context Window: 1,000,000 tokens
Cost: $5/M input, $25/M output
Release Date: Apr 16, 2026

Claude Opus Latest (Anthropic)

This model always redirects to the latest model in the Claude Opus family.

Type: Text
Context Window: 1,000,000 tokens
Cost: $5/M input, $25/M output
Release Date: 2025

Claude Sonnet 4.6 (Anthropic)

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

Type: Text
Context Window: 1,000,000 tokens
Cost: $3/M input, $15/M output
Release Date: Feb 17, 2026

Claude Sonnet 4.5 (Anthropic)

Claude Sonnet 4.5 is Anthropic's most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence.

Type: Text
Context Window: 1,000,000 tokens
Cost: $3/M input, $15/M output
Release Date: Sep 29, 2025

Claude Haiku 4.5 (Anthropic)

Claude Haiku 4.5 is Anthropic's fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4's performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications.

Type: Text
Context Window: 200,000 tokens
Cost: $1/M input, $5/M output
Release Date: Oct 15, 2025

CoBuddy (free) (Baidu)

CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency.

Type: Text
Context Window: 131,072 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

Qianfan-OCR-Fast (free) (Baidu)

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence.

Type: Text
Context Window: 65,536 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

ERNIE 4.5 VL 424B A47B (Baidu)

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu's ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It supports both thinking and non-thinking inference modes.

Type: Text
Context Window: 123,000 tokens
Cost: $0.42/M input, $1.25/M output
Release Date: Jun 30, 2025

Amazon Nova Premier 1.0 (Amazon)

Amazon Nova Premier is the most capable of Amazon's multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.

Type: Text
Context Window: 1,000,000 tokens
Cost: $2.5/M input, $12.5/M output
Release Date: Nov 1, 2025

FLUX.2 Flex (Black Forest Labs)

FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing in the same unified architecture.

Type: Image
Context Window: 67,000 tokens
Cost: $0.06 per megapixel
Release Date: Nov 25, 2025

Recraft V3 (Recraft)

Recraft V3 is an image generation model from Recraft. It supports text and image inputs with image output at ~1K resolution across multiple aspect ratios.

Type: Image
Context Window: 66,000 tokens
Cost: $0.04 per image
Release Date: May 8, 2026

DeepSeek V3.2 (DeepSeek)

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA) for efficient long-context processing.

Type: Text
Context Window: 131,000 tokens
Cost: $0.252/M input, $0.378/M output
Release Date: Dec 1, 2025

DeepSeek V4 Flash (DeepSeek)

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.14/M input, $0.28/M output
Release Date: Apr 24, 2026

DeepSeek V4 Pro (DeepSeek)

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning and coding.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.435/M input, $0.87/M output
Release Date: Apr 24, 2026

Gemini 2.0 Flash (Google)

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling.

Type: Text
Context Window: 1,000,000 tokens
Cost: $0.1/M input, $0.4/M output
Release Date: Feb 5, 2025

Gemini 2.5 Flash (Google)

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in thinking capabilities.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.3/M input, $2.5/M output
Release Date: Jun 17, 2025

Gemini 2.5 Flash Lite (Google)

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput and faster token generation.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.1/M input, $0.4/M output
Release Date: Jul 22, 2025

Gemini 2.5 Pro (Google)

Gemini 2.5 Pro is Google's state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs thinking capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling.

Type: Text
Context Window: 1,050,000 tokens
Cost: $1.25/M input, $10/M output
Release Date: Jun 17, 2025

Gemini 3 Flash Preview (Google)

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.5/M input, $3/M output
Release Date: Dec 17, 2025

Gemini 3.1 Pro Preview (Google)

Gemini 3.1 Pro Preview is Google's frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows.

Type: Text
Context Window: 1,050,000 tokens
Cost: $2/M input, $12/M output
Release Date: Feb 19, 2026

Gemini 3.1 Flash Lite (Google)

Gemini 3.1 Flash Lite is Google's GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs.

Type: Text
Context Window: 1,048,576 tokens
Cost: $0.25/M input, $1.5/M output
Release Date: 2025

Gemini 3.1 Flash Lite Preview (Google)

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.25/M input, $1.5/M output
Release Date: Mar 3, 2026

Google Gemini Flash Latest (Google)

This model always redirects to the latest model in the Google Gemini Flash family.

Type: Text
Context Window: 1,048,576 tokens
Cost: $0.5/M input, $3/M output
Release Date: 2025

Google Gemini Pro Latest (Google)

This model always redirects to the latest model in the Google Gemini Pro family.

Type: Text
Context Window: 1,048,576 tokens
Cost: $2/M input, $12/M output
Release Date: 2025

Gemma 4 31B (Google)

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages.

Type: Text
Context Window: 262,000 tokens
Cost: $0.13/M input, $0.38/M output
Release Date: Apr 2, 2026

Gemma 4 26B A4B (Google)

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference.

Type: Text
Context Window: 262,000 tokens
Cost: $0.06/M input, $0.33/M output
Release Date: Apr 3, 2026

Lyria 3 Pro Preview (Google)

Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images.

Type: Audio
Context Window: 1,050,000 tokens
Cost: $0.08 per song
Release Date: Mar 31, 2026

Granite 4.1 8B (IBM)

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window.

Type: Text
Context Window: 131,072 tokens
Cost: $0.05/M input, $0.1/M output
Release Date: 2025

Ling-2.6-1T (InclusionAI)

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company's trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency.

Type: Text
Context Window: 262,144 tokens
Cost: $0.3/M input, $2.5/M output
Release Date: 2025

Ling-2.6-flash (InclusionAI)

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses.

Type: Text
Context Window: 262,144 tokens
Cost: $0.08/M input, $0.24/M output
Release Date: 2025

Ring-2.6-1T (free) (InclusionAI)

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency.

Type: Text
Context Window: 262,144 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

Weaver (alpha) (Mancer)

An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.

Type: Text
Context Window: 8,000 tokens
Cost: $0.75/M input, $1/M output
Release Date: Aug 2, 2023

MiniMax M2.7 (MiniMax)

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. It integrates advanced agentic capabilities through multi-agent collaboration.

Type: Text
Context Window: 197,000 tokens
Cost: $0.299/M input, $1.2/M output
Release Date: Mar 18, 2026

MiniMax M2.5 (MiniMax)

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work.

Type: Text
Context Window: 197,000 tokens
Cost: $0.15/M input, $1.15/M output
Release Date: Feb 12, 2026

Mistral Medium 3.5 (Mistral AI)

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding.

Type: Text
Context Window: 262,144 tokens
Cost: $1.5/M input, $7.5/M output
Release Date: 2025

Mistral Nemo (Mistral AI)

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual and supports function calling.

Type: Text
Context Window: 131,000 tokens
Cost: $0.02/M input, $0.03/M output
Release Date: Jul 19, 2024

Mistral Saba (Mistral AI)

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance.

Type: Text
Context Window: 33,000 tokens
Cost: $0.2/M input, $0.6/M output
Release Date: Feb 17, 2025

Kimi K2.6 (MoonshotAI)

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration.

Type: Text
Context Window: 262,000 tokens
Cost: $0.75/M input, $3.5/M output
Release Date: Apr 20, 2026

Kimi K2.5 (MoonshotAI)

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm.

Type: Text
Context Window: 262,000 tokens
Cost: $0.44/M input, $2/M output
Release Date: Jan 27, 2026

MoonshotAI Kimi Latest (MoonshotAI)

This model always redirects to the latest model in the MoonshotAI Kimi family.

Type: Text
Context Window: 262,144 tokens
Cost: $0.75/M input, $3.5/M output
Release Date: 2025

Nemotron 3 Nano 30B A3B (NVIDIA)

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

Type: Text
Context Window: 256,000 tokens
Cost: $0/M input, $0/M output
Release Date: Dec 14, 2025

Nemotron 3 Nano Omni (free) (NVIDIA)

NVIDIA Nemotron 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems.

Type: Text
Context Window: 256,000 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

Nemotron 3 Super (NVIDIA)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications.

Type: Text
Context Window: 262,000 tokens
Cost: $0/M input, $0/M output
Release Date: Mar 11, 2026

Hunyuan A13B Instruct (Tencent)

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought.

Type: Text
Context Window: 131,000 tokens
Cost: $0.14/M input, $0.57/M output
Release Date: Jul 8, 2025

GPT Audio (OpenAI)

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency.

Type: Audio
Context Window: 128,000 tokens
Cost: $2.5/M input, $10/M output
Release Date: Jan 20, 2026

GPT Chat Latest (OpenAI)

GPT Chat Latest points to OpenAI's stable API alias chat-latest that always resolves to the latest Instant chat model used in ChatGPT.

Type: Text
Context Window: 400,000 tokens
Cost: $5/M input, $30/M output
Release Date: 2025

GPT-3.5 Turbo 16k (OpenAI)

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost.

Type: Text
Context Window: 16,000 tokens
Cost: $3/M input, $4/M output
Release Date: Aug 28, 2023

GPT-3.5 Turbo Instruct (OpenAI)

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations.

Type: Text
Context Window: 4,000 tokens
Cost: $1.5/M input, $2/M output
Release Date: Sep 28, 2023

GPT-4o Mini (OpenAI)

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable.

Type: Text
Context Window: 128,000 tokens
Cost: $0.15/M input, $0.6/M output
Release Date: Jul 18, 2024

GPT-4o Mini TTS (OpenAI)

GPT-4o Mini TTS is OpenAI's cost-efficient text-to-speech model. It converts text input into natural-sounding audio output, supporting a variety of voices and tones.

Type: Audio
Context Window: 4,000 tokens
Cost: $0.6 per 1M characters
Release Date: Apr 19, 2026

GPT-4o Mini Transcribe (OpenAI)

GPT-4o Mini Transcribe is OpenAI's smaller, cost-efficient speech-to-text model built on GPT-4o Mini audio capabilities.

Type: Text
Context Window: 128,000 tokens
Cost: $1.25/M input, $5/M output
Release Date: May 1, 2026

GPT-5.4 Image 2 (OpenAI)

GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2.

Type: Text
Context Window: 272,000 tokens
Cost: $8/M input, $15/M output
Release Date: 2025

GPT-5.4 (OpenAI)

GPT-5.4 is OpenAI's latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window.

Type: Text
Context Window: 1,050,000 tokens
Cost: $2.5/M input, $15/M output
Release Date: Mar 5, 2026

GPT-5.4 Mini (OpenAI)

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads.

Type: Text
Context Window: 400,000 tokens
Cost: $0.75/M input, $4.5/M output
Release Date: Mar 17, 2026

GPT-5.4 Nano (OpenAI)

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks.

Type: Text
Context Window: 400,000 tokens
Cost: $0.2/M input, $1.25/M output
Release Date: Mar 17, 2026

GPT-5.5 (OpenAI)

GPT-5.5 is OpenAI's frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency.

Type: Text
Context Window: 1,050,000 tokens
Cost: $5/M input, $30/M output
Release Date: 2025

GPT-5.5 Pro (OpenAI)

GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads.

Type: Text
Context Window: 1,050,000 tokens
Cost: $30/M input, $180/M output
Release Date: 2025

GPT-5 Mini (OpenAI)

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks.

Type: Text
Context Window: 400,000 tokens
Cost: $0.25/M input, $2/M output
Release Date: Aug 7, 2025

GPT-5 Nano (OpenAI)

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments.

Type: Text
Context Window: 400,000 tokens
Cost: $0.05/M input, $0.4/M output
Release Date: Aug 7, 2025

GPT-5.3 Codex (OpenAI)

GPT-5.3-Codex is OpenAI's most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with broader reasoning capabilities.

Type: Text
Context Window: 400,000 tokens
Cost: $1.75/M input, $14/M output
Release Date: Feb 24, 2026

GPT-4.1 Mini (OpenAI)

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost.

Type: Text
Context Window: 1,050,000 tokens
Cost: $0.4/M input, $1.6/M output
Release Date: Apr 14, 2025

gpt-oss-120b (OpenAI)

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases.

Type: Text
Context Window: 131,000 tokens
Cost: $0.039/M input, $0.18/M output
Release Date: Aug 5, 2025

gpt-oss-20b (OpenAI)

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license.

Type: Text
Context Window: 131,000 tokens
Cost: $0.03/M input, $0.14/M output
Release Date: Aug 5, 2025

o1 (OpenAI)

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.

Type: Text
Context Window: 200,000 tokens
Cost: $15/M input, $60/M output
Release Date: Dec 17, 2024

o1 Pro (OpenAI)

The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.

Type: Text
Context Window: 200,000 tokens
Cost: $150/M input, $600/M output
Release Date: Mar 20, 2025

o3 Pro (OpenAI)

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.

Type: Text
Context Window: 200,000 tokens
Cost: $20/M input, $80/M output
Release Date: Jun 11, 2025

OpenAI GPT Latest (OpenAI)

This model always redirects to the latest model in the OpenAI GPT family.

Type: Text
Context Window: 1,050,000 tokens
Cost: $5/M input, $30/M output
Release Date: 2025

OpenAI GPT Mini Latest (OpenAI)

This model always redirects to the latest model in the OpenAI GPT Mini family.

Type: Text
Context Window: 400,000 tokens
Cost: $0.75/M input, $4.5/M output
Release Date: 2025

Owl Alpha (OpenRouter)

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation.

Type: Text
Context Window: 1,048,756 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

Inflection 3 Productivity (Inflection)

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines.

Type: Text
Context Window: 8,000 tokens
Cost: $2.5/M input, $10/M output
Release Date: Oct 11, 2024

Laguna M.1 (free) (Poolside)

Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engineering tasks.

Type: Text
Context Window: 131,072 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

Laguna XS.2 (free) (Poolside)

Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient coding agent series.

Type: Text
Context Window: 131,072 tokens
Cost: $0/M input, $0/M output
Release Date: 2025

Qwen Plus 0728 (Qwen)

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Type: Text
Context Window: 1,000,000 tokens
Cost: $0.26/M input, $0.78/M output
Release Date: Sep 8, 2025

Qwen3.5 Plus 2026-04-20 (Qwen)

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window.

Type: Text
Context Window: 1,000,000 tokens
Cost: $0.4/M input, $2.4/M output
Release Date: 2025

Qwen3.6 27B (Qwen)

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026.

Type: Text
Context Window: 262,144 tokens
Cost: $0.32/M input, $3.2/M output
Release Date: 2025

Qwen3.6 35B A3B (Qwen)

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token.

Type: Text
Context Window: 262,144 tokens
Cost: $0.15/M input, $1/M output
Release Date: 2025

Qwen3.6 Flash (Qwen)

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window.

Type: Text
Context Window: 1,000,000 tokens
Cost: $0.25/M input, $1.5/M output
Release Date: 2025

Qwen3.6 Max Preview (Qwen)

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture.

Type: Text
Context Window: 262,144 tokens
Cost: $1.04/M input, $6.24/M output
Release Date: 2025

Qwen3.6 Plus (Qwen)

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing.

Type: Text
Context Window: 1,000,000 tokens
Cost: $0.325/M input, $1.95/M output
Release Date: Apr 2, 2026

Qwen3.5 Flash (Qwen)

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model.

Type: Text
Context Window: 1,000,000 tokens
Cost: $0.065/M input, $0.26/M output
Release Date: Feb 26, 2026

Qwen3 235B A22B Instruct 2507 (Qwen)

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture.

Type: Text
Context Window: 262,000 tokens
Cost: $0.071/M input, $0.1/M output
Release Date: Jul 21, 2025

Qwen3.5 397B A17B (Qwen)

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model.

Type: Text
Context Window: 262,000 tokens
Cost: $0.39/M input, $2.34/M output
Release Date: Feb 16, 2026

Qwen3 Embedding 8B (Qwen)

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.

Type: Embeddings
Context Window: 32,000 tokens
Cost: $0.01/M input, $0/M output
Release Date: Oct 29, 2025

Recraft V4 (Recraft)

Recraft V4 is an image generation model from Recraft. It supports text and image inputs with image output at ~1K resolution across multiple aspect ratios.

Type: Image
Context Window: 66,000 tokens
Cost: $0.04 per image
Release Date: May 8, 2026

Recraft V4 Pro (Recraft)

Recraft V4 Pro is an image generation model from Recraft. It supports text and image inputs with image output at ~2K resolution.

Type: Image
Context Window: 66,000 tokens
Cost: $0.25 per image
Release Date: May 8, 2026

Relace Search (Relace)

The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase and return relevant files to the user request.

Type: Text
Context Window: 256,000 tokens
Cost: $1/M input, $3/M output
Release Date: Dec 8, 2025

Riverflow V2 Max Preview (Sourceful)

Riverflow V2 Max Preview is the most powerful variant of Sourceful's Riverflow V2 preview lineup.

Type: Image
Context Window: 8,000 tokens
Cost: $0.075 per image
Release Date: Dec 9, 2025

Riverflow V2 Pro (Sourceful)

Riverflow V2 Pro is the most powerful variant of Sourceful's Riverflow 2.0 lineup, best for top-tier control and perfect text rendering.

Type: Image
Context Window: 8,000 tokens
Cost: $0.15 per image
Release Date: Feb 2, 2026

Riverflow V2 Fast Preview (Sourceful)

Riverflow V2 Fast Preview is the fastest variant of Sourceful's Riverflow V2 preview lineup.

Type: Image
Context Window: 8,000 tokens
Cost: $0.03 per image
Release Date: Dec 9, 2025

Riverflow V2 Standard Preview (Sourceful)

Riverflow V2 Standard Preview is the standard variant of Sourceful's Riverflow V2 preview lineup.

Type: Image
Context Window: 8,000 tokens
Cost: $0.035 per image
Release Date: Dec 9, 2025

Step 3.5 Flash (StepFun)

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture.

Type: Text
Context Window: 262,000 tokens
Cost: $0.1/M input, $0.3/M output
Release Date: Jan 30, 2026

Hy3 preview (Tencent)

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use.

Type: Text
Context Window: 262,144 tokens
Cost: $0.066/M input, $0.26/M output
Release Date: 2025

Switchpoint Router (Switchpoint)

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library.

Type: Text
Context Window: 131,000 tokens
Cost: $0.85/M input, $3.4/M output
Release Date: Jul 12, 2025

Grok 4.1 Fast (xAI)

Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research.

Type: Text
Context Window: 2,000,000 tokens
Cost: $0.2/M input, $0.5/M output
Release Date: Nov 20, 2025

Grok 4 Fast (xAI)

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.

Type: Text
Context Window: 2,000,000 tokens
Cost: $0.2/M input, $0.5/M output
Release Date: Sep 19, 2025

Grok 4.3 (xAI)

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output.

Type: Text
Context Window: 1,000,000 tokens
Cost: $1.25/M input, $2.5/M output
Release Date: 2025

Grok 3 Mini Beta (xAI)

Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding.

Type: Text
Context Window: 131,000 tokens
Cost: $0.3/M input, $0.5/M output
Release Date: Apr 10, 2025

Grok 3 Beta (xAI)

Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization.

Type: Text
Context Window: 131,000 tokens
Cost: $3/M input, $15/M output
Release Date: Apr 10, 2025

MiMo-V2.5 (Xiaomi)

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost.

Type: Text
Context Window: 1,048,576 tokens
Cost: $0.4/M input, $2/M output
Release Date: 2025

MiMo-V2.5-Pro (Xiaomi)

MiMo-V2.5-Pro is Xiaomi's flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks.

Type: Text
Context Window: 1,048,576 tokens
Cost: $1/M input, $3/M output
Release Date: 2025

MiMo-V2-Flash (Xiaomi)

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters.

Type: Text
Context Window: 262,000 tokens
Cost: $0.1/M input, $0.3/M output
Release Date: Dec 14, 2025

GLM 4.5 Air (Z.ai)

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications.

Type: Text
Context Window: 131,000 tokens
Cost: $0.13/M input, $0.85/M output
Release Date: Jul 26, 2025

GLM 5.1 (Z.ai)

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks.

Type: Text
Context Window: 203,000 tokens
Cost: $1.05/M input, $3.5/M output
Release Date: Apr 7, 2026

GLM 5 (Z.ai)

GLM-5 is Z.ai's flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows.

Type: Text
Context Window: 203,000 tokens
Cost: $0.6/M input, $1.92/M output
Release Date: Feb 11, 2026

GLM 5 Turbo (Z.ai)

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments.

Type: Text
Context Window: 203,000 tokens
Cost: $1.2/M input, $4/M output
Release Date: Mar 15, 2026

GLM 4.7 (Z.ai)

GLM-4.7 is Z.ai's latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution.

Type: Text
Context Window: 203,000 tokens
Cost: $0.4/M input, $1.75/M output
Release Date: Dec 22, 2025

all-mpnet-base-v2 (Sentence Transformers)

The all-mpnet-base-v2 embedding model encodes sentences and short paragraphs into a 768-dimensional dense vector space.

Type: Embeddings
Context Window: 512 tokens
Cost: $0.005/M input, $0/M output
Release Date: Nov 18, 2025

all-MiniLM-L12-v2 (Sentence Transformers)

The all-MiniLM-L12-v2 embedding model maps sentences and short paragraphs into a 384-dimensional dense vector space.

Type: Embeddings
Context Window: 512 tokens
Cost: $0.005/M input, $0/M output
Release Date: Nov 18, 2025

paraphrase-MiniLM-L6-v2 (Sentence Transformers)

The paraphrase-MiniLM-L6-v2 embedding model converts sentences and short paragraphs into a 384-dimensional dense vector space.

Type: Embeddings
Context Window: 512 tokens
Cost: $0.005/M input, $0/M output
Release Date: Nov 18, 2025

multi-qa-mpnet-base-dot-v1 (Sentence Transformers)

The multi-qa-mpnet-base-dot-v1 embedding model transforms sentences and short paragraphs into a 768-dimensional dense vector space.

Type: Embeddings
Context Window: 512 tokens
Cost: $0.005/M input, $0/M output
Release Date: Nov 18, 2025

E5-Base-v2 (Intfloat)

The e5-base-v2 embedding model encodes English sentences and paragraphs into a 768-dimensional dense vector space.

Type: Embeddings
Context Window: 512 tokens
Cost: $0.005/M input, $0/M output
Release Date: Nov 18, 2025

Voxtral Mini TTS (Mistral AI)

Voxtral Mini TTS is Mistral's text-to-speech model featuring zero-shot voice cloning and multilingual support.

Type: Speech
Context Window: 4,000 tokens
Cost: $16 per 1M characters
Release Date: Apr 19, 2026

Orpheus 3B (Canopy Labs)

Orpheus 3B is an English text-to-speech model from Canopy Labs, fine-tuned for natural prosody and expressive delivery.

Type: Audio
Context Window: 4,000 tokens
Cost: $7 per 1M characters
Release Date: Apr 24, 2026

CSM 1B (Sesame)

CSM 1B is a conversational speech model from Sesame. It accepts text input and produces English speech output.

Type: Audio
Context Window: 4,000 tokens
Cost: $7 per 1M characters
Release Date: Apr 24, 2026

Zonos v0.1 Hybrid (Zyphra)

Zonos v0.1 Hybrid is a text-to-speech model from Zyphra built on a hybrid architecture. It produces English speech output.

Type: Audio
Context Window: 4,000 tokens
Cost: $7 per 1M characters
Release Date: Apr 24, 2026

Zonos v0.1 Transformer (Zyphra)

Zonos v0.1 Transformer is a text-to-speech model from Zyphra built on a pure transformer architecture.

Type: Audio
Context Window: 4,000 tokens
Cost: $7 per 1M characters
Release Date: Apr 24, 2026

Llama 3 Euryale 70B v2.1 (Sao10k)

Euryale 70B v2.1 is a model focused on creative roleplay. Better prompt adherence, better anatomy and spatial awareness.

Type: Text
Context Window: 8,000 tokens
Cost: $1.48/M input, $1.48/M output
Release Date: Jun 18, 2024

CodeLLaMa 7B Instruct Solidity (AlfredPros)

A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract.

Type: Text
Context Window: 4,000 tokens
Cost: $0.8/M input, $1.2/M output
Release Date: Apr 14, 2025

Hermes 4 405B (Nous AI)

Hermes 4 405B is Nous AI's flagship reasoning model with 405B parameters and hybrid reasoning support. Supports instant, hybrid, and deep reasoning modes.

Type: Text
Context Window: 131,000 tokens
Cost: $1/M input, $3/M output
Release Date: 2025

Hermes 4 70B (Nous AI)

Hermes 4 70B offers the best cost/performance ratio in the Hermes family. Supports hybrid reasoning with variable output token counts.

Type: Text
Context Window: 131,000 tokens
Cost: $0.13/M input, $0.4/M output
Release Date: 2025

Hermes 3 Llama 3.1 70B (Nous AI)

Hermes 3 based on Llama 3.1 70B. Generalist model optimized for agentic workflows and OpenAI-compatible endpoints.

Type: Text
Context Window: 65,000 tokens
Cost: $0.3/M input, $0.3/M output
Release Date: 2025

Hermes 4.3 36B (Nous AI)

Hermes 4.3 36B is a fast, lightweight model for simple tasks. Best for high-volume, low-latency use cases.

Type: Text
Context Window: 32,000 tokens
Cost: $0.1/M input, $0.39/M output
Release Date: 2025

Frequently Asked Questions

How often is the AI pricing data updated?

Our data is updated regularly to reflect the latest pricing changes across all major providers including OpenAI, Anthropic, Google, DeepSeek, and specialized inference hosts.

What is the difference between input and output token costs?

Input tokens (your prompt) are typically 3-5x cheaper than output tokens (the model's response). When calculating costs for RAG systems, input tokens usually dominate the budget.

What is the context window and cost of Aion-1.0-Mini?

Aion-1.0-Mini has a context window of 131,000 tokens. The pricing is $0.7 per 1M input tokens and $1.4 per 1M output tokens.

What is the context window and cost of Aion-RP 1.0 (8B)?

Aion-RP 1.0 (8B) has a context window of 33,000 tokens. The pricing is $0.8 per 1M input tokens and $1.6 per 1M output tokens.

What is the context window and cost of Aion-1.0?

Aion-1.0 has a context window of 131,000 tokens. The pricing is $4 per 1M input tokens and $8 per 1M output tokens.

What is the context window and cost of Goliath 120B?

Goliath 120B has a context window of 6,000 tokens. The pricing is $3.75 per 1M input tokens and $7.5 per 1M output tokens.

What is the context window and cost of Anthropic Claude Haiku Latest?

Anthropic Claude Haiku Latest has a context window of 200,000 tokens. The pricing is $1 per 1M input tokens and $5 per 1M output tokens.