AI Model Pricing Calculator

Compare costs across 20+ frontier and open-weight models including text, voice, and multimodal models.

Input
Ctrl+Enter
Output

Paste your data and click Process

Ctrl+Enter

Compare AI Model Pricing API Costs

Understanding the cost of generative AI models is critical for production workloads. This tool tracks real-time pricing for the leading API providers, comparing cost per million tokens (input/output), context lengths, and model specialties.

Aion-1.0-Mini (AionLabs)

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B.

Aion-RP 1.0 (8B) (AionLabs)

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.

Aion-1.0 (AionLabs)

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.

Goliath 120B (alpindale)

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.

Anthropic Claude Haiku Latest (Anthropic)

This model always redirects to the latest model in the Anthropic Claude Haiku family.

Anthropic Claude Sonnet Latest (Anthropic)

This model always redirects to the latest model in the Anthropic Claude Sonnet family.

Claude Opus 4.6 (Anthropic)

Opus 4.6 is Anthropic's strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts.

Claude Opus 4.6 (Fast) (Anthropic)

Fast-mode variant of Opus 4.6 - identical capabilities with higher output speed at premium 6x pricing.

Claude Opus 4.7 (Anthropic)

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on complex, multi-step tasks and more reliable agentic execution across extended workflows.

Claude Opus Latest (Anthropic)

This model always redirects to the latest model in the Claude Opus family.

Claude Sonnet 4.6 (Anthropic)

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

Claude Sonnet 4.5 (Anthropic)

Claude Sonnet 4.5 is Anthropic's most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence.

Claude Haiku 4.5 (Anthropic)

Claude Haiku 4.5 is Anthropic's fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4's performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications.

CoBuddy (free) (Baidu)

CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency.

Qianfan-OCR-Fast (free) (Baidu)

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence.

ERNIE 4.5 VL 424B A47B (Baidu)

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu's ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It supports both thinking and non-thinking inference modes.

Amazon Nova Premier 1.0 (Amazon)

Amazon Nova Premier is the most capable of Amazon's multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.

FLUX.2 Flex (Black Forest Labs)

FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing in the same unified architecture.

Recraft V3 (Recraft)

Recraft V3 is an image generation model from Recraft. It supports text and image inputs with image output at ~1K resolution across multiple aspect ratios.

DeepSeek V3.2 (DeepSeek)

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA) for efficient long-context processing.

DeepSeek V4 Flash (DeepSeek)

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads.

DeepSeek V4 Pro (DeepSeek)

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning and coding.

Gemini 2.0 Flash (Google)

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling.

Gemini 2.5 Flash (Google)

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in thinking capabilities.

Gemini 2.5 Flash Lite (Google)

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput and faster token generation.

Gemini 2.5 Pro (Google)

Gemini 2.5 Pro is Google's state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs thinking capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling.

Gemini 3 Flash Preview (Google)

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants.

Gemini 3.1 Pro Preview (Google)

Gemini 3.1 Pro Preview is Google's frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows.

Gemini 3.1 Flash Lite (Google)

Gemini 3.1 Flash Lite is Google's GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs.

Gemini 3.1 Flash Lite Preview (Google)

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality.

Google Gemini Flash Latest (Google)

This model always redirects to the latest model in the Google Gemini Flash family.

Google Gemini Pro Latest (Google)

This model always redirects to the latest model in the Google Gemini Pro family.

Gemma 4 31B (Google)

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages.

Gemma 4 26B A4B (Google)

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference.

Lyria 3 Pro Preview (Google)

Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images.

Granite 4.1 8B (IBM)

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window.

Ling-2.6-1T (InclusionAI)

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company's trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency.

Ling-2.6-flash (InclusionAI)

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses.

Ring-2.6-1T (free) (InclusionAI)

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency.

Weaver (alpha) (Mancer)

An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.

MiniMax M2.7 (MiniMax)

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. It integrates advanced agentic capabilities through multi-agent collaboration.

MiniMax M2.5 (MiniMax)

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work.

Mistral Medium 3.5 (Mistral AI)

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding.

Mistral Nemo (Mistral AI)

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual and supports function calling.

Mistral Saba (Mistral AI)

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance.

Kimi K2.6 (MoonshotAI)

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration.

Kimi K2.5 (MoonshotAI)

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm.

MoonshotAI Kimi Latest (MoonshotAI)

This model always redirects to the latest model in the MoonshotAI Kimi family.

Nemotron 3 Nano 30B A3B (NVIDIA)

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

Nemotron 3 Nano Omni (free) (NVIDIA)

NVIDIA Nemotron 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems.

Nemotron 3 Super (NVIDIA)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications.

Hunyuan A13B Instruct (Tencent)

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought.

GPT Audio (OpenAI)

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency.

GPT Chat Latest (OpenAI)

GPT Chat Latest points to OpenAI's stable API alias chat-latest that always resolves to the latest Instant chat model used in ChatGPT.

GPT-3.5 Turbo 16k (OpenAI)

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost.

GPT-3.5 Turbo Instruct (OpenAI)

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations.

GPT-4o Mini (OpenAI)

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable.

GPT-4o Mini TTS (OpenAI)

GPT-4o Mini TTS is OpenAI's cost-efficient text-to-speech model. It converts text input into natural-sounding audio output, supporting a variety of voices and tones.

GPT-4o Mini Transcribe (OpenAI)

GPT-4o Mini Transcribe is OpenAI's smaller, cost-efficient speech-to-text model built on GPT-4o Mini audio capabilities.

GPT-5.4 Image 2 (OpenAI)

GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2.

GPT-5.4 (OpenAI)

GPT-5.4 is OpenAI's latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window.

GPT-5.4 Mini (OpenAI)

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads.

GPT-5.4 Nano (OpenAI)

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks.

GPT-5.5 (OpenAI)

GPT-5.5 is OpenAI's frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency.

GPT-5.5 Pro (OpenAI)

GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads.

GPT-5 Mini (OpenAI)

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks.

GPT-5 Nano (OpenAI)

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments.

GPT-5.3 Codex (OpenAI)

GPT-5.3-Codex is OpenAI's most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with broader reasoning capabilities.

GPT-4.1 Mini (OpenAI)

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost.

gpt-oss-120b (OpenAI)

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases.

gpt-oss-20b (OpenAI)

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license.

o1 (OpenAI)

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.

o1 Pro (OpenAI)

The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.

o3 Pro (OpenAI)

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.

OpenAI GPT Latest (OpenAI)

This model always redirects to the latest model in the OpenAI GPT family.

OpenAI GPT Mini Latest (OpenAI)

This model always redirects to the latest model in the OpenAI GPT Mini family.

Owl Alpha (OpenRouter)

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation.

Inflection 3 Productivity (Inflection)

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines.

Laguna M.1 (free) (Poolside)

Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engineering tasks.

Laguna XS.2 (free) (Poolside)

Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient coding agent series.

Qwen Plus 0728 (Qwen)

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Qwen3.5 Plus 2026-04-20 (Qwen)

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window.

Qwen3.6 27B (Qwen)

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026.

Qwen3.6 35B A3B (Qwen)

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token.

Qwen3.6 Flash (Qwen)

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window.

Qwen3.6 Max Preview (Qwen)

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture.

Qwen3.6 Plus (Qwen)

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing.

Qwen3.5 Flash (Qwen)

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model.

Qwen3 235B A22B Instruct 2507 (Qwen)

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture.

Qwen3.5 397B A17B (Qwen)

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model.

Qwen3 Embedding 8B (Qwen)

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.

Recraft V4 (Recraft)

Recraft V4 is an image generation model from Recraft. It supports text and image inputs with image output at ~1K resolution across multiple aspect ratios.

Recraft V4 Pro (Recraft)

Recraft V4 Pro is an image generation model from Recraft. It supports text and image inputs with image output at ~2K resolution.

Relace Search (Relace)

The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase and return relevant files to the user request.

Riverflow V2 Max Preview (Sourceful)

Riverflow V2 Max Preview is the most powerful variant of Sourceful's Riverflow V2 preview lineup.

Riverflow V2 Pro (Sourceful)

Riverflow V2 Pro is the most powerful variant of Sourceful's Riverflow 2.0 lineup, best for top-tier control and perfect text rendering.

Riverflow V2 Fast Preview (Sourceful)

Riverflow V2 Fast Preview is the fastest variant of Sourceful's Riverflow V2 preview lineup.

Riverflow V2 Standard Preview (Sourceful)

Riverflow V2 Standard Preview is the standard variant of Sourceful's Riverflow V2 preview lineup.

Step 3.5 Flash (StepFun)

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture.

Hy3 preview (Tencent)

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use.

Switchpoint Router (Switchpoint)

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library.

Grok 4.1 Fast (xAI)

Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research.

Grok 4 Fast (xAI)

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window.

Grok 4.3 (xAI)

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output.

Grok 3 Mini Beta (xAI)

Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding.

Grok 3 Beta (xAI)

Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization.

MiMo-V2.5 (Xiaomi)

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost.

MiMo-V2.5-Pro (Xiaomi)

MiMo-V2.5-Pro is Xiaomi's flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks.

MiMo-V2-Flash (Xiaomi)

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters.

GLM 4.5 Air (Z.ai)

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications.

GLM 5.1 (Z.ai)

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks.

GLM 5 (Z.ai)

GLM-5 is Z.ai's flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows.

GLM 5 Turbo (Z.ai)

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments.

GLM 4.7 (Z.ai)

GLM-4.7 is Z.ai's latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution.

all-mpnet-base-v2 (Sentence Transformers)

The all-mpnet-base-v2 embedding model encodes sentences and short paragraphs into a 768-dimensional dense vector space.

all-MiniLM-L12-v2 (Sentence Transformers)

The all-MiniLM-L12-v2 embedding model maps sentences and short paragraphs into a 384-dimensional dense vector space.

paraphrase-MiniLM-L6-v2 (Sentence Transformers)

The paraphrase-MiniLM-L6-v2 embedding model converts sentences and short paragraphs into a 384-dimensional dense vector space.

multi-qa-mpnet-base-dot-v1 (Sentence Transformers)

The multi-qa-mpnet-base-dot-v1 embedding model transforms sentences and short paragraphs into a 768-dimensional dense vector space.

E5-Base-v2 (Intfloat)

The e5-base-v2 embedding model encodes English sentences and paragraphs into a 768-dimensional dense vector space.

Voxtral Mini TTS (Mistral AI)

Voxtral Mini TTS is Mistral's text-to-speech model featuring zero-shot voice cloning and multilingual support.

Orpheus 3B (Canopy Labs)

Orpheus 3B is an English text-to-speech model from Canopy Labs, fine-tuned for natural prosody and expressive delivery.

CSM 1B (Sesame)

CSM 1B is a conversational speech model from Sesame. It accepts text input and produces English speech output.

Zonos v0.1 Hybrid (Zyphra)

Zonos v0.1 Hybrid is a text-to-speech model from Zyphra built on a hybrid architecture. It produces English speech output.

Zonos v0.1 Transformer (Zyphra)

Zonos v0.1 Transformer is a text-to-speech model from Zyphra built on a pure transformer architecture.

Llama 3 Euryale 70B v2.1 (Sao10k)

Euryale 70B v2.1 is a model focused on creative roleplay. Better prompt adherence, better anatomy and spatial awareness.

CodeLLaMa 7B Instruct Solidity (AlfredPros)

A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract.

Hermes 4 405B (Nous AI)

Hermes 4 405B is Nous AI's flagship reasoning model with 405B parameters and hybrid reasoning support. Supports instant, hybrid, and deep reasoning modes.

Hermes 4 70B (Nous AI)

Hermes 4 70B offers the best cost/performance ratio in the Hermes family. Supports hybrid reasoning with variable output token counts.

Hermes 3 Llama 3.1 70B (Nous AI)

Hermes 3 based on Llama 3.1 70B. Generalist model optimized for agentic workflows and OpenAI-compatible endpoints.

Hermes 4.3 36B (Nous AI)

Hermes 4.3 36B is a fast, lightweight model for simple tasks. Best for high-volume, low-latency use cases.

Frequently Asked Questions

How often is the AI pricing data updated?

Our data is updated regularly to reflect the latest pricing changes across all major providers including OpenAI, Anthropic, Google, DeepSeek, and specialized inference hosts.

What is the difference between input and output token costs?

Input tokens (your prompt) are typically 3-5x cheaper than output tokens (the model's response). When calculating costs for RAG systems, input tokens usually dominate the budget.

What is the context window and cost of Aion-1.0-Mini?

Aion-1.0-Mini has a context window of 131,000 tokens. The pricing is $0.7 per 1M input tokens and $1.4 per 1M output tokens.

What is the context window and cost of Aion-RP 1.0 (8B)?

Aion-RP 1.0 (8B) has a context window of 33,000 tokens. The pricing is $0.8 per 1M input tokens and $1.6 per 1M output tokens.

What is the context window and cost of Aion-1.0?

Aion-1.0 has a context window of 131,000 tokens. The pricing is $4 per 1M input tokens and $8 per 1M output tokens.

What is the context window and cost of Goliath 120B?

Goliath 120B has a context window of 6,000 tokens. The pricing is $3.75 per 1M input tokens and $7.5 per 1M output tokens.

What is the context window and cost of Anthropic Claude Haiku Latest?

Anthropic Claude Haiku Latest has a context window of 200,000 tokens. The pricing is $1 per 1M input tokens and $5 per 1M output tokens.