{"data":[{"company":"DeepSeek","name":"DeepSeek-V4-Pro（限时2.5折）","api_name":"deepseek/deepseek-v4-pro","description":"DeepSeek V4 Pro 是 DeepSeek 推出的大型专家混合模型，它专为高级推理、编码和长远视野代理工作流设计，在知识、数学和软件工程基准测试中表现出色。它基于与 DeepSeek V4 Flash 相同的架构，引入了混合注意力系统，实现高效的长上下文处理，并支持多种推理模式，根据任务平衡速度和深度。它非常适合复杂工作负载，如全代码库分析、多步自动化和大规模信息综合，这些工作在能力和效率上都至关重要。","max_tokens":384000,"context_window":1000000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":17142,"completion":34285,"cache":142,"image":0,"request":0},"id":"deepseek/deepseek-v4-pro","support_apis":["/v1/chat/completions","/v1/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-Max-2026-01-23","api_name":"ali/qwen3-max-2026-01-23","description":"通义千问3系列Max模型，相较2025年9月23日快照，此版本实现思考模式和非思考模式的有效融合，模型整体效果得到全方位的大幅度提升。在思考模式下，同时发布Web搜索、Web信息提取和代码解释器工具能力，使得模型在慢思考的同时，能够通过引入外部工具，以更高的准确性解决更有难度的问题。此版本为2026年1月23日快照。","max_tokens":32000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":3571,"completion":14285,"cache":0,"image":0,"request":0},"id":"ali/qwen3-max-2026-01-23","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"MiniMax","name":"MiniMax M2.1","api_name":"minimax/minimax-m2.1","description":" MiniMax M2.1，强大多语言编程实力，全面升级编程体验","max_tokens":131000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2999,"completion":11999,"cache":299,"image":0,"request":0},"id":"minimax/minimax-m2.1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT 5.2 Codex","api_name":"openai/gpt-5.2-codex","description":"GPT-5.2-Codex 旨在实现可导向性、前端开发和交互性。\n开发者可以利用 gpt-5.2-codex 进行各种智能、多模态的开发任务。对于寻求更智能、更安全、更集成 AI 协助的开发者来说，该模型为你的 AI 工具包提供了强大的新工具。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":17500,"completion":140000,"cache":1750,"image":0,"request":0},"id":"openai/gpt-5.2-codex","support_apis":["/v1/responses"]},{"company":"Ali","name":"Qwen3-Coder-Plus","api_name":"ali/qwen3-coder-plus","description":"Qwen3-Coder-Plus，这是通义千问系列迄今为止最具代理能力的代码模型。","max_tokens":65536,"context_window":1048576,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":22857,"cache":2285,"image":0,"request":0},"id":"ali/qwen3-coder-plus","support_apis":["/v1/chat/completions","/v1/messages","/v1/completions","/v1/responses"]},{"company":"MiniMax","name":"MiniMax M2.1 lightning","api_name":"minimax/minimax-m2.1-lightning","description":"MiniMax-M2.1-lightning，强大多语言编程实力，全面升级编程体验，更快的速度。","max_tokens":131000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2999,"completion":23999,"cache":299,"image":0,"request":0},"id":"minimax/minimax-m2.1-lightning","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.7","api_name":"bigmodel/glm-4.7","description":"最新旗舰模型：通用对话、推理与智能体能力上实现全面升级。回复更简洁自然，写作更具沉浸感，且在工具调用时指令遵循更强，Artifacts 与 Agentic Coding 的前端美感和长程任务完成效率进一步提升","max_tokens":16000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":571,"image":0,"request":0},"id":"bigmodel/glm-4.7","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5.2","api_name":"openai/gpt-5.2","description":"GPT-5.2 是 GPT-5 系列中最新的前沿级模型，相比 GPT-5.1 提供了更强的代理和长上下文性能。它利用自适应推理动态分配计算，快速响应简单查询，同时在复杂任务上投入更多深度。\nGPT-5.2 专为广泛任务覆盖而设计，在数学、编程、科学和工具调用工作负载中持续带来进步，提供更连贯的长篇答案和提升的工具使用可靠性。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":17500,"completion":140000,"cache":1750,"image":0,"request":0},"id":"openai/gpt-5.2","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Ali","name":"Qwen3.6-Plus","api_name":"ali/qwen3.6-plus","description":"Qwen3.6原生视觉语言系列Plus模型，展现出与当前顶尖前沿模型相媲美的卓越性能，模型效果相较3.5系列显著提升。模型在Agentic coding、前端编程、Vibe coding等代码能力、多模态万物识别、OCR、物体定位等能力上显著增强。","max_tokens":64000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":17142,"cache":285,"image":0,"request":0},"id":"ali/qwen3.6-plus","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Google","name":"Gemini 3.1 Flash Lite Preview","api_name":"google/gemini-3.1-flash-lite-preview","description":"Gemini 3.1 Flash-Lite 是谷歌最具成本效益的 Gemini 型号，针对低延迟、高成本的大型语言模型流量进行了优化。它相比 Gemini 2.0/2.5 Flash-Lite 型号实现了显著的质量提升，在关键能力领域的性能与 Gemini 2.5 闪存相当。","max_tokens":65535,"context_window":1048576,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":2500,"completion":15000,"cache":250,"image":0,"request":0},"id":"google/gemini-3.1-flash-lite-preview","support_apis":["/v1/chat/completions","/v1/messages","/v1beta/models/*","/v1/models/*"]},{"company":"OpenAI","name":"GPT 5.3 Codex","api_name":"openai/gpt-5.3-codex","description":"GPT-5.3-Codex 将先进的编码能力、更广泛的推理和专业问题解决整合在一个专为真实工程工作打造的单一模型中。它将 GPT-5.2-Codex 的前沿编码性能与 GPT5.2 的推理和专业知识能力整合到一个系统中。这使得体验从优化孤立输出转向支持更长时间的开发工作;当仓库庞大、变更跨越多个步骤，且需求不总是在一开始就完全明确时。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":17500,"completion":140000,"cache":1750,"image":0,"request":0},"id":"openai/gpt-5.3-codex","support_apis":["/v1/responses"]},{"company":"Tencent","name":"Hunyuan-3-preview","api_name":"tencent/hy3-preview","description":"混元 Hy3 preview 面向 Agent 工作负载设计，采用 295B/21B 激活的 MoE 架构。在同一个模型内提供 no_think（极速响应）、think_low（快速思考）、think_high（深度推理）三档模式，适配从高频交互到复杂工程任务的不同延迟与深度需求。在 SWE-bench Verified 等代码基准上接近当前最强水平，256K 上下文支持跨文件代码重构与长文档分析。适合需要可靠任务完成度、同时对推理成本敏感的开发者。","max_tokens":128000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":1714,"completion":5714,"cache":571,"image":0,"request":0},"id":"tencent/hy3-preview","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Moonshot","name":"Kimi K2.6","api_name":"moonshot/kimi-k2.6","description":"Kimi K2.6 是 Kimi 最新最智能的模型，Kimi K2.6 的通用 Agent、代码、视觉理解等综合能力得到全面提升，其中在博士级难度的完整版人类最后的考试（Humanity’s Last Exam）、在考察模型真实软件工程能力的 SWE-Bench Pro、评估 Agent 深度检索能力的 DeepSearchQA 等基准测试中均取得行业领先的成绩，同时支持文本、图片与视频输入，思考与非思考模式，对话与 Agent 任务。","max_tokens":32000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":9285,"completion":38571,"cache":1571,"image":0,"request":0},"id":"moonshot/kimi-k2.6","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Xiaomi","name":"MiMo-V2.5-Pro","api_name":"xiaomi/mimo-v2.5-pro","description":"Xiaomi MiMo-V2-Pro 专为现实世界中高强度的 Agent 工作场景而打造。它拥有超过 1T 的总参数量（42B 激活参数），采用创新的混合注意力架构，并支持 1M 超长上下文长度。在强大的模型基座上，我们在更为广泛的 Agent 场景中持续 Scaling 算力，进一步拓展了智能的动作空间，实现了从 Coding 到 Claw 的重要泛化。","max_tokens":128000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":9999,"completion":29999,"cache":1999,"image":0,"request":0},"id":"xiaomi/mimo-v2.5-pro","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"MiniMax","name":"MiniMax M2.7","api_name":"minimax/minimax-m2.7","description":"M2.7 在真实的软件工程中有优异的表现，包括端到端的完整项目交付，分析日志排查 Bug、代码安全，机器学习等。\n在专业办公领域，我们提升了模型在各领域的专业知识和任务交付能力，在 GDPval-AA 的ELO得分是1495，为开源最高。M2.7 对 Office 三件套 Excel/PPT/Word 的复杂编辑能力显著提升，能更好地完成多轮修改和高保真的编辑。\nM2.7具备与复杂环境交互的能力，M2.7 在 40 个复杂 skills (\u003e 2000 Token) 的 case 上，仍能保持 97% 的 skills 遵循率。在OpenClaw的使用中，M2.7相比于M2.5也有了显著的提升，在MMClaw的评测中接近最新的Sonnet 4.6。\nM2.7具备优秀的身份保持能力和情商，除了生产力使用外，给互动娱乐场景的创新也准备了空间。","max_tokens":32000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2999,"completion":11999,"cache":599,"image":0,"request":0},"id":"minimax/minimax-m2.7","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Moonshot","name":"Kimi K2.5","api_name":"moonshot/kimi-k2.5","description":"Kimi K2.5 是 Kimi 迄今最智能的模型，在 Agent、代码、视觉理解及一系列通用智能任务上取得开源 SoTA 表现。同时 Kimi K2.5 也是 Kimi 迄今最全能的模型，原生的多模态架构设计，同时支持视觉与文本输入、思考与非思考模式、对话与 Agent 任务。\nKimi K2.5 在原有全栈开发与工具生态优势基础上，进一步强化前端代码质量与设计表现力前端能力，实现了跨越式突破，支持从自然语言直接生成功能完整、视觉精美的交互界面，并能精准处理动态布局、滚动动画等复杂效果。","max_tokens":32000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":29999,"cache":999,"image":0,"request":0},"id":"moonshot/kimi-k2.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"StepFun","name":"Step 3.5 Flash-2603","api_name":"stepfun/step-3.5-flash-2603","description":"基于 Step 3.5 Flash 针对高频 Agent 场景优化，在保留旗舰推理与工具调用能力的同时，进一步提升 Token 效率与推理速度，并支持切换低推理模式以降低消耗。对 Coding 与 Agent 框架兼容性也做了专项优化。","max_tokens":256000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":999,"completion":2999,"cache":199,"image":0,"request":0},"id":"stepfun/step-3.5-flash-2603","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3.6-Max-Preview","api_name":"ali/qwen3.6-max-preview","description":"Qwen3.6系列中规模最大、综合能力最强的Max模型Preview版本，当前开放纯文本模型能力供体验。相较于此前发布的Qwen3-Max和Qwen3.6-Plus，本模型在vibe coding能力上进一步提升、coding agent执行更加高效、前端编程开发能力显著提升；长尾知识能力进一步升级。","max_tokens":64000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":12857,"completion":77142,"cache":1285,"image":0,"request":0},"id":"ali/qwen3.6-max-preview","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Anthropic","name":"Claude Opus 4.7","api_name":"anthropic/claude-opus-4.7","description":"Opus 4.7 是 Anthropic Opus 系列的下一代产品，专为长时间运行的异步代理而设计。它在 Opus 4.6 的编码和代理功能优势基础上，显著提升了复杂多步骤任务的性能，并在扩展工作流中实现了更可靠的代理执行。对于任务随时间展开的异步代理管道（例如大型代码库、多阶段调试和端到端项目编排），Opus 4.7 尤为有效。","max_tokens":128000,"context_window":1000000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":50000,"completion":250000,"cache":5000,"image":0,"request":0},"id":"anthropic/claude-opus-4.7","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"StepFun","name":"Step 3.5 Flash","api_name":"stepfun/step-3.5-flash","description":"阶跃星辰的旗舰语言推理模型。该模型具备顶尖推理能力与快速可靠的执行能力。能够完成对复杂任务的分解、计划，可快速可靠地调用工具执行任务，胜任逻辑推理、数学、软件工程、深度研究等各种复杂任务。上下文长度为256K。","max_tokens":256000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":999,"completion":2999,"cache":199,"image":0,"request":0},"id":"stepfun/step-3.5-flash","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Sonnet 4.5 Thinking","api_name":"anthropic/claude-sonnet-4.5:thinking","description":"Claude Sonnet 4.5 是 Anthropic 迄今为止最先进的 Sonnet 模型，针对真实代理和编码工作流程进行了优化。它在 SWE-bench Verified 等编码基准测试中展现出顶尖性能，并在系统设计、代码安全性和规范遵循性方面均有所改进。该模型旨在实现扩展自主操作，保持跨会话的任务连续性，并提供基于事实的进度跟踪。\n\nSonnet 4.5 还引入了更强大的代理功能，包括改进的工具编排、推测并行执行以及更高效的上下文和内存管理。凭借增强的上下文跟踪和跨工具调用的令牌使用感知功能，它尤其适用于多上下文和长时间运行的工作流。用例涵盖软件工程、网络安全、财务分析、研究代理以及其他需要持续推理和工具使用的领域。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"image+text","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-sonnet-4.5:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Xiaomi","name":"MiMo-V2.5","api_name":"xiaomi/mimo-v2.5","description":"MiMo-V2.5 是小米原生的全模态模型。它以约一半的推理成本实现专业级代理性能，同时在图像和视频理解任务中的多模态感知上超越 MiMo-V2-Omni。其 100 万上下文窗口支持完整文档、扩展对话和复杂任务上下文，非常适合与代理框架集成，在需要强大推理、丰富感知和成本效益的环境中。","max_tokens":128000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":3999,"completion":19999,"cache":799,"image":0,"request":0},"id":"xiaomi/mimo-v2.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Google","name":"Gemini 3.1 Pro Preview","api_name":"google/gemini-3.1-pro-preview","description":"Gemini 3.1 Pro Preview 专为高级开发和智能体系统而设计，在提升令牌效率的同时，增强了长期稳定性、工具编排能力。它引入了一种新的中间思维模式，以更好地平衡成本、速度和性能。该模型在智能体编码、结构化规划、多模态分析和工作流自动化方面表现出色，因此非常适合自主智能体、金融建模、电子表格自动化和高上下文企业任务。","max_tokens":65535,"context_window":1048576,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":20000,"completion":120000,"cache":2000,"image":0,"request":0},"id":"google/gemini-3.1-pro-preview","support_apis":["/v1/chat/completions","/v1/messages","/v1beta/models/*","/v1/models/*"]},{"company":"Anthropic","name":"Claude Haiku 4.5 Thinking","api_name":"anthropic/claude-haiku-4.5:thinking","description":"Claude Haiku 4.5 为各种用例提供​​近乎前沿的性能，并脱颖而出，成为最佳编码和代理模型之一——其速度和成本恰到好处，能够为免费产品和海量用户体验提供支持。用例：为免费层级用户体验提供支持：Claude Haiku 4.5 以合理的成本和速度提供近乎前沿的性能，使为免费代理产品和代理用例提供支持在经济上可行。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":10000,"completion":50000,"cache":1000,"image":0,"request":0},"id":"anthropic/claude-haiku-4.5:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Opus 4.5","api_name":"anthropic/claude-opus-4.5","description":"Anthropic 最智能模型的下一代产品 ——Claude Opus 4.5，在编码、代理、计算机使用以及企业工作流方面处于行业领先地位。应用场景如下：\n编码：Opus 4.5 能够自信地在几小时内完成原本需要数天的软件开发项目，它可以独立工作，凭借深厚的技术功底和出色的品味，打造高效且简洁的解决方案。它在各种编程语言上的性能都有所提升，在规划和架构选择方面表现更优，是企业开发者的理想模型。\n代理：Claude Opus 4.5 结合我们先进的工具使用能力，能打造出功能更强大、具备新行为模式的代理。\n计算机使用：作为我们目前最出色的计算机使用模型，Claude Opus 4.5 能以自信、一致的方式应对新场景，实现更接近人类的浏览体验，从而优化网络问答、工作流自动化以及高级用户体验。\n企业工作流：Opus 4.5 可以为代理提供支持，助力其从头到尾管理庞大的专业项目。它能更好地利用内存来保持文件间的上下文连贯性和一致性，同时在创建电子表格、幻灯片和文档方面实现了显著的飞跃式改进。\n财务分析：Opus 4.5 能够整合复杂信息系统中的各类数据，包括监管文件、市场报告和内部数据，从而实现复杂的预测建模和主动合规。\ncybersecurity（网络安全）：Opus 4.5 为安全工作流带来专业级别的分析能力，它能关联日志、漏洞数据库和威胁情报，实现主动威胁检测和自动化事件响应。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":50000,"completion":250000,"cache":5000,"image":0,"request":0},"id":"anthropic/claude-opus-4.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":"Deepseek V3.2","api_name":"deepseek/deepseek-v3.2","description":"DeepSeek-V3.2 是一款大型语言模型，旨在兼顾高计算效率、强大的推理能力以及智能体工具使用性能。它引入了 DeepSeek 稀疏注意力机制 (DSA)，这是一种细粒度的稀疏注意力机制，能够在长上下文场景下降低训练和推理成本，同时保持模型质量。可扩展的强化学习后训练框架进一步提升了推理能力，其性能已达到 GPT-5 的水平，并在 2025 年国际数学奥林匹克竞赛 (IMO) 和国际信息学奥林匹克竞赛 (IOI) 中荣获金奖。V3.2 还采用了大规模智能体任务合成流程，将推理更好地融入工具使用场景，从而提升模型在交互式环境中的适应性和泛化能力。","max_tokens":32000,"context_window":128000,"supports_prompt_cache":true,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":4285,"cache":285,"image":0,"request":0},"id":"deepseek/deepseek-v3.2","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5.4-Nano","api_name":"openai/gpt-5.4-nano","description":"GPT-5.4-nano 是一款轻量化、超高效模型，专为大规模低延迟、低成本任务设计。GPT-5.4-nano 专为专业知识工作设计，包括文档和电子表格的创建、编码、数据分析、代理式工作流以及软件自动化。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2000,"completion":12500,"cache":200,"image":0,"request":0},"id":"openai/gpt-5.4-nano","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"DeepSeek","name":"DeepSeek V3.2 Thinking","api_name":"deepseek/deepseek-v3.2-think","description":"DeepSeek-V3.2 是一款大型语言模型，旨在兼顾高计算效率、强大的推理能力以及智能体工具使用性能。它引入了 DeepSeek 稀疏注意力机制 (DSA)，这是一种细粒度的稀疏注意力机制，能够在长上下文场景下降低训练和推理成本，同时保持模型质量。可扩展的强化学习后训练框架进一步提升了推理能力，其性能已达到 GPT-5 的水平，并在 2025 年国际数学奥林匹克竞赛 (IMO) 和国际信息学奥林匹克竞赛 (IOI) 中荣获金奖。V3.2 还采用了大规模智能体任务合成流程，将推理更好地融入工具使用场景，从而提升模型在交互式环境中的适应性和泛化能力。","max_tokens":64000,"context_window":128000,"supports_prompt_cache":true,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":4285,"cache":285,"image":0,"request":0},"id":"deepseek/deepseek-v3.2-think","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3.5-397B-A17B","api_name":"ali/qwen3.5-397b-a17b","description":"Qwen3.5系列397B-A17B原生视觉语言模型，基于混合架构设计，融合了线性注意力机制与稀疏混合专家模型，实现了更高的推理效率。在语言理解、逻辑推理、代码生成、智能体任务、图像理解、视频理解、图形用户界面（GUI）等多种任务中，均展现出与当前顶尖前沿模型相媲美的卓越性能。具备强大的代码生成与智能体能力，对于各类智能体场景具有良好的泛化性。","max_tokens":64000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":1714,"completion":10285,"cache":0,"image":0,"request":0},"id":"ali/qwen3.5-397b-a17b","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"MiniMax","name":"MiniMax M2.5","api_name":"minimax/minimax-m2.5","description":"在 MiniMax 真实业务场景中，整体任务的 30% 由 M2.5 自主完成，覆盖研发、产品、销售、HR、财务等职能，且渗透率仍在持续上升。其中，在编程场景表现尤为突出，M2.5 生成的代码已占新提交代码的 80%。","max_tokens":131000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":2999,"completion":11999,"cache":299,"image":0,"request":0},"id":"minimax/minimax-m2.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5.5","api_name":"openai/gpt-5.5","description":"GPT-5.5 是 OpenAI 为复杂专业工作负载设计的前沿模型，基于 GPT-5.4，具备更强的推理能力、更高的可靠性，处理困难任务。它配备了 1M上下文窗口（922K 输入，128K 输出），支持文本和图像输入，支持在单一系统内实现大规模推理、编码和多模态工作流。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":50000,"completion":300000,"cache":5000,"image":0,"request":0},"id":"openai/gpt-5.5","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Anthropic","name":"Claude Opus 4.6","api_name":"anthropic/claude-opus-4.6","description":"Opus 4.6 是 Anthropic 最强的代码生成和长期专业任务模型。它专为跨整个工作流运行的代理而构建，而不是单个提示，这使得它在大型代码库、复杂的重构和随着时间推移展开的多步骤调试方面特别有效。该模型比之前的模型表现出更深入的上下文理解、更强的问题分解能力，以及在工程任务难度较高的情况下更可靠的可靠性。","max_tokens":128000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"text","tokenizer":""},"pricing":{"prompt":50000,"completion":250000,"cache":5000,"image":0,"request":0},"id":"anthropic/claude-opus-4.6","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Haiku 4.5","api_name":"anthropic/claude-haiku-4.5","description":"Claude Haiku 4.5 为各种用例提供​​近乎前沿的性能，并脱颖而出，成为最佳编码和代理模型之一——其速度和成本恰到好处，能够为免费产品和海量用户体验提供支持。用例：为免费层级用户体验提供支持：Claude Haiku 4.5 以合理的成本和速度提供近乎前沿的性能，使为免费代理产品和代理用例提供支持在经济上可行。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":10000,"completion":50000,"cache":1000,"image":0,"request":0},"id":"anthropic/claude-haiku-4.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3.5-Flash","api_name":"ali/qwen3.5-flash","description":"Qwen3.5原生视觉语言系列Flash模型，基于混合架构设计，融合了线性注意力机制与稀疏混合专家模型，实现了更高的推理效率。模型效果在纯文本与多模态方面相较3系列均实现飞跃式进步；响应速度快，兼具推理速度和性能。","max_tokens":64000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":285,"completion":2857,"cache":28,"image":0,"request":0},"id":"ali/qwen3.5-flash","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Ali","name":"Qwen3.5-Plus","api_name":"ali/qwen3.5-plus","description":"Qwen3.5原生视觉语言系列Plus模型，基于混合架构设计，融合了线性注意力机制与稀疏混合专家模型，实现了更高的推理效率。在多项任务评测中，3.5系列均展现出与当前顶尖前沿模型相媲美的卓越性能，模型效果在纯文本与多模态方面相较3系列均实现飞跃式进步。","max_tokens":64000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":1142,"completion":6857,"cache":0,"image":0,"request":0},"id":"ali/qwen3.5-plus","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5.1","api_name":"openai/gpt-5.1","description":"GPT-5.1 是 GPT-5 系列的最新前沿模型，与 GPT-5 相比，它拥有更强大的通用推理能力、更高的指令遵循度以及更自然的对话风格。它采用自适应推理技术动态分配计算资源，能够快速响应简单查询，同时在处理复杂任务时投入更多精力。该模型能够生成更清晰、更贴近实际的解释，减少专业术语的使用，即使是技术性或多步骤问题也更容易理解。\n\nGPT-5.1 旨在覆盖广泛的任务，在数学、编程和结构化分析等工作负载方面均取得了持续的提升，能够提供更连贯的长篇答案，并提高工具使用的可靠性。它还改进了对话对齐方式，能够在不牺牲准确性的前提下，提供更亲切、更直观的回答。GPT-5.1 是 GPT-5 的主要全功能继任者。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":12500,"completion":100000,"cache":1250,"image":0,"request":0},"id":"openai/gpt-5.1","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Streamlake","name":"KAT-Coder-Pro-V1","api_name":"streamlake/kat-coder-pro-v1","description":"专为 Agentic Coding 设计，全面覆盖编程任务与场景，通过大规模智能体强化学习，实现智能行为涌现，在代码编写性能上显著超越同类模型。","max_tokens":32000,"context_window":128000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":5714,"completion":22857,"cache":1142,"image":0,"request":0},"id":"streamlake/kat-coder-pro-v1","support_apis":["/v1/chat/completions","/v1/completions","/v1/messages"]},{"company":"Moonshot","name":"Kimi K2 0905","api_name":"moonshot/kimi-k2","description":"( 最新版本 0905) Kimi K2是一款上下文长度256k的模型，具备更强的Agentic Coding能力、更突出的前端代码的美观度和实用性、以及更好的上下文理解能力。","max_tokens":32000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":22857,"cache":1428,"image":0,"request":0},"id":"moonshot/kimi-k2","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-5-Turbo","api_name":"bigmodel/glm-5-turbo","description":"GLM-5-Turbo 是面向 OpenClaw 龙虾场景深度优化的基座模型。 其从训练阶段就针对龙虾任务的核心需求进行专项优化，增强如工具调用、指令遵循、定时与持续性任务、长链路执行等核心能力，使其在复杂、动态、长链路的任务中也真正具备可执行性。","max_tokens":128000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":7142,"completion":31428,"cache":1714,"image":0,"request":0},"id":"bigmodel/glm-5-turbo","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Streamlake","name":"KAT-Coder-Air-V1","api_name":"streamlake/kat-coder-air-v1","description":"KAT-Coder 系列模型中的轻量化版本。专为 Agentic Coding 设计，全面覆盖编程任务与场景，通过大规模智能体强化学习，实现智能行为涌现，在代码编写性能上显著超越同类模型。","max_tokens":32000,"context_window":128000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":0,"completion":0,"cache":0,"image":0,"request":0},"id":"streamlake/kat-coder-air-v1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Sonnet 4.6","api_name":"anthropic/claude-sonnet-4.6","description":"Sonnet 4.6 是 Anthropic 迄今为止最强大的 Sonnet 级模型，在编码、代理和专业工作领域表现出色。它擅长迭代开发、复杂的代码库导航、带内存的端到端项目管理、精致的文档创建，以及自信的网页质量保证和工作流程自动化的计算机使用。","max_tokens":64000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-sonnet-4.6","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT 5.1 Codex","api_name":"openai/gpt-5.1-codex","description":"（当前仅支持/v1/responses接口）GPT-5.1-Codex 是 GPT-5.1 的一个特殊版本，专为软件工程和编码工作流程而优化。它既适用于交互式开发，也适用于长时间独立执行复杂的工程任务。该模型支持从零开始构建项目、功能开发、调试、大规模重构和代码审查。与 GPT-5.1 相比，Codex 更易于控制，能够更严格地遵循开发者指令，并生成更简洁、更高质量的代码输出。可以通过 `reasoning.effort` 参数调整推理难度。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":12500,"completion":100000,"cache":1250,"image":0,"request":0},"id":"openai/gpt-5.1-codex","support_apis":["/v1/responses"]},{"company":"Anthropic","name":"Claude Sonnet 4.5","api_name":"anthropic/claude-sonnet-4.5","description":"Claude Sonnet 4.5 是 Anthropic 迄今为止最先进的 Sonnet 模型，针对真实代理和编码工作流程进行了优化。它在 SWE-bench Verified 等编码基准测试中展现出顶尖性能，并在系统设计、代码安全性和规范遵循性方面均有所改进。该模型旨在实现扩展自主操作，保持跨会话的任务连续性，并提供基于事实的进度跟踪。\n\nSonnet 4.5 还引入了更强大的代理功能，包括改进的工具编排、推测并行执行以及更高效的上下文和内存管理。凭借增强的上下文跟踪和跨工具调用的令牌使用感知功能，它尤其适用于多上下文和长时间运行的工作流。用例涵盖软件工程、网络安全、财务分析、研究代理以及其他需要持续推理和工具使用的领域。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-sonnet-4.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-5.1","api_name":"bigmodel/glm-5.1","description":"GLM-5.1 是智谱最新旗舰模型，代码能力大大增强，长程任务显著提升，能够在单次任务中持续、自主地工作长达 8 小时，完成从规划、执行到迭代优化的完整闭环，交付工程级成果。\n在综合能力与 Coding 能力上，GLM-5.1 整体表现对齐 Claude Opus 4.6，并在长程自主执行、复杂工程优化与真实开发场景中展现出更强的持续工作能力，是构建 Autonomous Agent 与长程 Coding Agent 的理想基座。","max_tokens":128000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":8571,"completion":34285,"cache":1857,"image":0,"request":0},"id":"bigmodel/glm-5.1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":"DeepSeek-V4-Flash","api_name":"deepseek/deepseek-v4-flash","description":"DeepSeek-V4-Flash是深度求索公司于2026年4月24日发布的高效经济版大语言模型。它采用混合专家（MoE）架构，总参数量约2840亿，每次推理仅激活约130亿参数，并原生支持高达100万Token的超长上下文。该版本在推理能力和简单Agent任务上接近性能强大的Pro版本，但凭借更小的参数和激活规模，能提供更快捷、低成本的API服务。适合需要快速响应和追求极致性价比的各类轻量级应用场景。","max_tokens":384000,"context_window":1000000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":1428,"completion":2857,"cache":285,"image":0,"request":0},"id":"deepseek/deepseek-v4-flash","support_apis":["/v1/chat/completions","/v1/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5.4","api_name":"openai/gpt-5.4","description":"GPT-5.4 是 OpenAI 最新的前沿模型，将 Codex 和 GPT 系列整合为一个系统。它配备了 1M+令牌上下文窗口（922K 输入，128K 输出），支持文本和图像输入，支持同一工作流程中的高上下文推理、编码和多模态分析。该模型在编码、文档理解、工具使用和指令执行方面提供了更好的性能。它被设计为通用任务和软件工程的强默认选项，能够生成生产质量的代码，综合多个来源的信息，并以更少的迭代和更高的令牌效率执行复杂的多步工作流程。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":25000,"completion":150000,"cache":2500,"image":0,"request":0},"id":"openai/gpt-5.4","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"OpenAI","name":"GPT-5.4-Mini","api_name":"openai/gpt-5.4-mini","description":"GPT-5.4-mini 是一款紧凑、成本效益高的模型，旨在满足高流量日常 AI 工作负载的可靠性能。GPT-5.4-mini 专为专业知识工作设计，包括文档和电子表格的创建、编码、数据分析、代理式工作流程以及软件自动化。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":7500,"completion":45000,"cache":750,"image":0,"request":0},"id":"openai/gpt-5.4-mini","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"OpenAI","name":"GPT-5.3 Chat","api_name":"openai/gpt-5.3-chat","description":"GPT-5.3 Chat 是对 ChatGPT 最常用模型的升级，使日常对话更加流畅、实用且直接实用。它能提供更准确的答案，更好地提供上下文，显著减少不必要的拒绝、附加条件和过于谨慎的措辞，避免打断对话流畅。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":17500,"completion":140000,"cache":1750,"image":0,"request":0},"id":"openai/gpt-5.3-chat","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Google","name":"Gemini 3 Flash Preview","api_name":"google/gemini-3-flash","description":"Gemini 3 Flash Preview 旨在以极高的速度和性价比提供强大的代理功能（接近专业级）。它非常适合进行多轮对话，并能与您的编码代理进行流畅的双向协作。与 2.5 Flash 版本相比，它在各方面都实现了显著提升。\n\nGemini 3 型号是具有思考能力的模型，能够在做出反应之前进行推理，从而提高性能和准确性。","max_tokens":65535,"context_window":1048576,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":5000,"completion":30000,"cache":500,"image":0,"request":0},"id":"google/gemini-3-flash","support_apis":["/v1/chat/completions","/v1/messages","/v1beta/models/*","/v1/models/*"]},{"company":"OpenAI","name":"GPT 5.1 Codex Max","api_name":"openai/gpt-5.1-codex-max","description":"（当前仅支持/v1/responses接口）GPT-5.1-Codex-Max 是 OpenAI 最新的代理编码模型，专为长期运行的高上下文软件开发任务设计。它基于更新版的 5.1 推理堆栈，并基于跨软件工程、数学和研究的代理工作流进行训练。\nGPT-5.1-Codex-Max 在整个开发生命周期中实现了更快的性能、更优的推理和更高的代币效率。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":12500,"completion":100000,"cache":1250,"image":0,"request":0},"id":"openai/gpt-5.1-codex-max","support_apis":["/v1/responses"]},{"company":"bigmodel","name":"GLM-5","api_name":"bigmodel/glm-5","description":"GLM-5 是智谱新一代的旗舰基座模型，面向 Agentic Engineering 打造，能够在复杂系统工程与长程 Agent 任务中提供可靠生产力。在 Coding 与 Agent 能力上，GLM-5 取得开源 SOTA 表现，在真实编程场景的使用体感逼近 Claude Opus 4.5，擅长复杂系统工程与长程 Agent 任务，是通用 Agent 助手的理想基座。","max_tokens":128000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":5714,"completion":25714,"cache":1428,"image":0,"request":0},"id":"bigmodel/glm-5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"xAI","name":"Grok 4 Fast","api_name":"x-ai/grok-4-fast","description":"Grok 4 Fast 是 xAI 最新的多模态模型，具有 SOTA 成本效益和 2M 令牌上下文窗口。它有两种类型：非推理和推理。","max_tokens":30000,"context_window":2000000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2000,"completion":5000,"cache":500,"image":0,"request":0},"id":"x-ai/grok-4-fast","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5-Nano","api_name":"openai/gpt-5-nano","description":"GPT-5-Nano是GPT-5系列中最快速的版本，专为开发者工具、即时交互和超低延迟场景优化。尽管在复杂推理能力上略逊于更大的模型，但它保留了核心的指令跟随与安全特性。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":500,"completion":4000,"cache":50,"image":0,"request":0},"id":"openai/gpt-5-nano","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Streamlake","name":"KAT-Coder-Exp-72B-1010","api_name":"streamlake/kat-coder-exp-72b-1010","description":"KAT-Coder-Exp-72B 是 KAT-Coder 系列模型中的 RL 创新实验版本，在软件开发能力评测基准 SWE-Bench verified 上取得了 74.6% 的卓越性能，创下开源模型新纪录。专注于 Agentic Coding，目前仅支持 SWE-Agent 脚手架，也可进行简单对话。","max_tokens":32000,"context_window":128000,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":0,"completion":0,"cache":0,"image":0,"request":0},"id":"streamlake/kat-coder-exp-72b-1010","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Google","name":"Gemini 2.5 Flash","api_name":"google/gemini-2.5-flash","description":"Gemini 2.5 Flash是Google在性价比方面表现最优的模型，兼具全面能力与高效性能。作为首个具备“思维能力”的 Flash 系列模型，Gemini 2.5 Flash支持可视化的思维过程，让用户直观了解模型在生成回答时的推理路径。","max_tokens":65535,"context_window":1048576,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":3000,"completion":25000,"cache":300,"image":0,"request":0},"id":"google/gemini-2.5-flash","support_apis":["/v1/chat/completions","/v1/messages","/v1beta/models/*","/v1/models/*"]},{"company":"OpenAI","name":"GPT-5-Codex","api_name":"openai/gpt-5-codex","description":"GPT-5-Codex 是 GPT-5 的专用版本，针对软件工程和编码工作流程进行了优化。它专为交互式开发会话和复杂工程任务的长时间独立执行而设计。该模型支持从头开始构建项目、功能开发、调试、大规模重构和代码审查。与 GPT-5 相比，Codex 更具可作性，严格遵守开发人员的指令，并生成更干净、更高质量的代码输出。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":12500,"completion":100000,"cache":1250,"image":0,"request":0},"id":"openai/gpt-5-codex","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5","api_name":"openai/gpt-5","description":"GPT-5是OpenAI推出的最新一代先进模型，在推理能力、代码生成质量和用户体验方面实现显著提升。该模型针对复杂任务进行了专项优化，尤其擅长需要逐步推理、精准遵循指令以及对准确性要求严格的高风险场景。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":12500,"completion":100000,"cache":1250,"image":0,"request":0},"id":"openai/gpt-5","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Ali","name":"Qwen-VL-OCR","api_name":"ali/qwen-vl-ocr","description":"通义千问VL-OCR（qwen-vl-ocr），即基于Qwen-VL训练的OCR识别大模型。通过统一模型的方式聚合多种图文识别、解析、处理类任务，提供强大的图文识别能力。","max_tokens":8192,"context_window":32000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":428,"completion":714,"cache":0,"image":0,"request":0},"id":"ali/qwen-vl-ocr","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-Max-Preview","api_name":"ali/qwen3-max-preview","description":"Qwen3-Max-Preview是阿里巴巴旗下通义千问团队发布的最新旗舰大语言模型，是 Qwen3 系列中参数量最大的模型，参数规模超过1万亿。模型在推理、指令跟随、多语言支持和长尾知识覆盖等方面有重大改进，支持超过100种语言，中英文理解能力出色。在数学推理、编程和科学推理等任务中表现出色，能更可靠地遵循复杂指令，减少幻觉，生成更高质量的响应。","max_tokens":32768,"context_window":262144,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":8571,"completion":34285,"cache":1714,"image":0,"request":0},"id":"ali/qwen3-max-preview","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":"DeepSeek-OCR","api_name":"deepseek/deepseek-ocr","description":"DeepSeek-OCR 是 DeepSeek AI 发布的视觉语言模型，旨在探索视觉-文本压缩边界，专注于文档识别及图像转文本场景的解决方案 。该模型可将长文本渲染为高压缩比图像，在 10 倍无损压缩下能实现 97% 的 OCR 准确率，即使压缩到 20 倍也能保持约 60% 的准确率。","max_tokens":8000,"context_window":8000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":428,"completion":1714,"cache":0,"image":0,"request":0},"id":"deepseek/deepseek-ocr","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.6","api_name":"bigmodel/glm-4.6","description":"GLM-4.6 是智谱最新的旗舰模型，其总参数量 355B，激活参数 32B。GLM-4.6 所有核心能力上均完成了对 GLM-4.5 的超越","max_tokens":128000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":571,"image":0,"request":0},"id":"bigmodel/glm-4.6","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Xiaomi","name":"MiMo V2 Flash","api_name":"xiaomi/mimo-v2-flash","description":"这是一个专为极致推理效率自研的总参数 309B（激活 15B）的 MoE 模型，通过 Hybrid 注意力架构创新及多层 MTP 推理加速，在多个 Agent 测评基准上保持进入全球开源模型 Top 2；代码能力超过所有开源模型，比肩标杆闭源模型 Claude 4.5 Sonnet，但推理成本仅为其 2.5%，生成速度提升 2 倍，成功将大模型推理效率推向极致。","max_tokens":64000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":999,"completion":2999,"cache":99,"image":0,"request":0},"id":"xiaomi/mimo-v2-flash","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.6 Thinking","api_name":"bigmodel/glm-4.6:thinking","description":"GLM-4.6 是智谱最新的旗舰模型，其总参数量 355B，激活参数 32B。GLM-4.6 所有核心能力上均完成了对 GLM-4.5 的超越","max_tokens":128000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":571,"image":0,"request":0},"id":"bigmodel/glm-4.6:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"xAI","name":"Grok 4","api_name":"x-ai/grok-4","description":"Grok 4是xAI的多模态大型语言模型，目前支持文本模态，视觉、图像生成等功能即将推出。","max_tokens":256000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":7500,"image":0,"request":0},"id":"x-ai/grok-4","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-Omni-Flash","api_name":"ali/qwen3-omni-flash","description":"Qwen-Omni 模型能够接收文本、图片、音频、视频等多种模态的组合输入，并生成文本或语音形式的回复， 提供多种拟人音色，支持多语言和方言的语音输出，可应用于文本创作、视觉识别、语音助手等场景。","max_tokens":16384,"context_window":65536,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2571,"completion":9857,"cache":0,"image":0,"request":0},"id":"ali/qwen3-omni-flash","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-VL-Plus","api_name":"ali/qwen3-vl-plus","description":"Qwen3系列视觉理解模型，实现思考模式和非思考模式的有效融合，视觉智能体能力在OS World等公开测试集上达到世界顶尖水平。此版本在视觉coding、空间感知、多模态思考等方向全面升级；视觉感知与识别能力大幅提升，支持超长视频理解。","max_tokens":258048,"context_window":262144,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1428,"completion":14285,"cache":0,"image":0,"request":0},"id":"ali/qwen3-vl-plus","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-5V-Turbo","api_name":"bigmodel/glm-5v-turbo","description":"GLM-5V-Turbo 是智谱首个多模态 Coding 基座模型，面向视觉编程任务打造。能够原生处理图片、视频、文本等多模态输入，同时擅长长程规划、复杂编程和动作执行；深度适配 Agent 工作流，能够与 Claude Code、OpenClaw 等 Agent 深度协同，完成”看懂环境→规划动作→执行任务”的完整闭环。","max_tokens":128000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+video+image","output":"text","tokenizer":""},"pricing":{"prompt":7142,"completion":31428,"cache":1714,"image":0,"request":0},"id":"bigmodel/glm-5v-turbo","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT 5.1 Codex Mini","api_name":"openai/gpt-5.1-codex-mini","description":"（当前仅支持/v1/responses接口）GPT-5.1-Codex 的小参数快速版本。","max_tokens":100000,"context_window":400000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2500,"completion":20000,"cache":250,"image":0,"request":0},"id":"openai/gpt-5.1-codex-mini","support_apis":["/v1/responses"]},{"company":"MiniMax","name":"MiniMax M2 ","api_name":"minimax/minimax-m2","description":" MiniMax M2，专为 Agent 和代码而生，仅 Claude Sonnet 8% 价格，2倍速度。","max_tokens":131000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2999,"completion":11999,"cache":299,"image":0,"request":0},"id":"minimax/minimax-m2","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"ByteDance","name":"Doubao-Seed-1.8","api_name":"bytedance/doubao-seed-1.8","description":"全新面向多模态 Agent 场景定向优化模型。更强Agent能力、升级多模态理解、更灵活的上下文管理。","max_tokens":64000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":11428,"cache":228,"image":0,"request":0},"id":"bytedance/doubao-seed-1.8","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Ali","name":"Qwen3-Max","api_name":"ali/qwen3-max","description":"Qwen3-Max 是在 Qwen3 系列的基础上构建的更新版本，与 2025 年 1 月版本相比，在推理、指令遵循、多语言支持和长尾知识覆盖方面进行了重大改进。它在数学、编码、逻辑和科学任务中提供更高的准确性，更可靠地遵循复杂的中文和英文指令，减少幻觉，并为开放式问答、写作和对话提供更高质量的回答。该模型支持 100 多种语言，具有更强的翻译和常识推理能力，并针对检索增强生成 （RAG） 和工具调用进行了优化，尽管它不包括专用的“思考”模式。","max_tokens":32768,"context_window":262144,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":8571,"completion":34285,"cache":1714,"image":0,"request":0},"id":"ali/qwen3-max","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Google","name":"Gemini 2.5 Pro","api_name":"google/gemini-2.5-pro","description":"Gemini 2.5 Pro是一款强大的推理模型，专为解决复杂问题而设计。它具备卓越的理解与分析能力，能够处理来自多种信息源的海量数据，包括文本、音频、图像、视频，甚至是完整的代码库。","max_tokens":65535,"context_window":1048576,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":12500,"completion":100000,"cache":1250,"image":0,"request":0},"id":"google/gemini-2.5-pro","support_apis":["/v1/chat/completions","/v1/messages","/v1beta/models/*","/v1/models/*"]},{"company":"OpenAI","name":"GPT-5.4-Pro","api_name":"openai/gpt-5.4-pro","description":"GPT-5 Pro 是 OpenAI 最先进的模型，在推理、代码质量和用户体验方面进行了重大改进。它针对需要分步推理、遵循指令和高风险用例准确性的复杂任务进行了优化。它支持测试时路由功能和高级提示理解，包括用户指定的意图，例如“认真考虑这个问题”。改进包括减少幻觉、阿谀奉承，以及在编码、写作和健康相关任务中表现更好。","max_tokens":128000,"context_window":1050000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":300000,"completion":1800000,"cache":0,"image":0,"request":0},"id":"openai/gpt-5.4-pro","support_apis":["/v1/responses"]},{"company":"ByteDance","name":"Doubao-Seed-2.0-lite","api_name":"bytedance/doubao-seed-2-0-lite","description":"面向高频企业场景兼顾性能与成本的均衡型模型，综合能力超越上一代Doubao-Seed-1.8。胜任非结构化信息处理、内容创作、搜索推荐、数据分析等生产型工作，支持长上下文、多源信息融合、多步指令执行与高保真结构化输出。在保障稳定效果的同时显著优化成本。","max_tokens":128000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":857,"completion":5142,"cache":171,"image":0,"request":0},"id":"bytedance/doubao-seed-2-0-lite","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"ByteDance","name":"Doubao-Seed-2.0-mini","api_name":"bytedance/doubao-seed-2-0-mini","description":"面向低时延、高并发与成本敏感场景，提供极致的模型推理速度。模型效果与Doubao-Seed-1.6相当。支持256k上下文、4档思考长度和多模态理解，适合成本和速度优先的轻量级任务。","max_tokens":128000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":285,"completion":2857,"cache":57,"image":0,"request":0},"id":"bytedance/doubao-seed-2-0-mini","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"xAI","name":"Grok 4.1 Fast","api_name":"x-ai/grok-4.1-fast","description":"Grok 4.1 Fast 是 xAI 最好的代理工具调用模型，在客户支持和深入研究等实际应用场景中表现出色。200 万上下文窗口。","max_tokens":30000,"context_window":2000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":2000,"completion":5000,"cache":500,"image":0,"request":0},"id":"x-ai/grok-4.1-fast","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"ByteDance","name":"Doubao-Seed-2.0-Code","api_name":"bytedance/doubao-seed-2-0-code","description":"面向真实编程环境优化的 Coding 模型，能稳定调用 Claude Code 等常见 IDE 中的工具。模型特别优化了前端能力，在使用常见的前端框架时能有良好表现。模型支持使用 Skills，可以配合多种自定义技能使用。","max_tokens":128000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":4571,"completion":22857,"cache":914,"image":0,"request":0},"id":"bytedance/doubao-seed-2-0-code","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Xiaomi","name":"MiMo-V2-Pro","api_name":"xiaomi/mimo-v2-pro","description":"Xiaomi MiMo-V2-Pro 专为现实世界中高强度的 Agent 工作场景而打造。它拥有超过 1T 的总参数量（42B 激活参数），采用创新的混合注意力架构，并支持 1M 超长上下文长度。在强大的模型基座上，我们在更为广泛的 Agent 场景中持续 Scaling 算力，进一步拓展了智能的动作空间，实现了从 Coding 到 Claw 的重要泛化。","max_tokens":128000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":9999,"completion":29999,"cache":1999,"image":0,"request":0},"id":"xiaomi/mimo-v2-pro","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Xiaomi","name":"MiMo-V2-Omni","api_name":"xiaomi/mimo-v2-omni","description":"MiMo-V2-Omni 专为现实世界中复杂的多模态交互与执行场景而生。我们从底层构建了融合文本、视觉、语音的全模态基座，并以统一架构将“感知”与“行动”深度绑定。这不仅打破了传统模型“重理解、轻执行”的局限，更让模型原生具备了多模态感知、工具调用、函数执行及 GUI 操作能力。MiMo-V2-Omni 可无缝接入各大智能体框架，实现了从理解到操控的跨越，大幅降低了全模态 Agent 的落地门槛。","max_tokens":128000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":3999,"completion":19999,"cache":799,"image":0,"request":0},"id":"xiaomi/mimo-v2-omni","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"ByteDance","name":"Doubao-Seed-2.0-pro","api_name":"bytedance/doubao-seed-2-0-pro","description":"旗舰级全能通用模型，面向 Agent 时代的复杂推理与长链路任务执行场景。强调多模态理解、长上下文推理、结构化生成与工具增强执行。复杂指令与多约束执行能力突出，可稳定应对多步复杂规划、复杂图文推理、视频内容理解与高难度分析等场景。","max_tokens":128000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":4571,"completion":22857,"cache":914,"image":0,"request":0},"id":"bytedance/doubao-seed-2-0-pro","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"MiniMax","name":"MiniMax M2.7-highspeed","api_name":"minimax/minimax-m2.7-highspeed","description":"M2.7 在真实的软件工程中有优异的表现，包括端到端的完整项目交付，分析日志排查 Bug、代码安全，机器学习等。\n在专业办公领域，我们提升了模型在各领域的专业知识和任务交付能力，在 GDPval-AA 的ELO得分是1495，为开源最高。M2.7 对 Office 三件套 Excel/PPT/Word 的复杂编辑能力显著提升，能更好地完成多轮修改和高保真的编辑。\nM2.7具备与复杂环境交互的能力，M2.7 在 40 个复杂 skills (\u003e 2000 Token) 的 case 上，仍能保持 97% 的 skills 遵循率。在OpenClaw的使用中，M2.7相比于M2.5也有了显著的提升，在MMClaw的评测中接近最新的Sonnet 4.6。\nM2.7具备优秀的身份保持能力和情商，除了生产力使用外，给互动娱乐场景的创新也准备了空间。","max_tokens":131000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":5999,"completion":23999,"cache":599,"image":0,"request":0},"id":"minimax/minimax-m2.7-highspeed","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Sonnet 4 ( Thinking )","api_name":"anthropic/claude-sonnet-4:thinking","description":"Claude Sonnet 4 ( Thinking )，在编程与推理任务中展现出更高的精度与可控性。该模型兼顾能力与计算效率，适用于从日常编码到复杂软件开发等多种场景。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-sonnet-4:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3 Next 80B A3B Instruct","api_name":"ali/qwen3-next-80b-a3b-Instruct","description":"Qwen3-Next-80B-A3B-Instruct是Qwen3-Next系列中的一款指令微调对话模型，专为提供快速、稳定的响应而优化。它能够胜任复杂推理、代码生成、知识问答和多语言应用，同时在对齐与格式控制方面保持稳健。与以往的Qwen3 指令模型相比，该版本在超长输入和多轮对话场景下具备更高吞吐与更强稳定性，非常适用于RAG、工具调用以及需要一致性最终答案的智能体工作流。","max_tokens":32768,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1428,"completion":5714,"cache":0,"image":0,"request":0},"id":"ali/qwen3-next-80b-a3b-Instruct","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3 Next 80B A3B Thinking","api_name":"ali/qwen3-next-80b-a3b-thinking","description":"Qwen3-Next-80B-A3B-Thinking 是Qwen3-Next系列中以推理为核心的对话模型，默认输出结构化的“思维链”内容。它专为复杂多步问题而设计，能够处理数学证明、代码生成与调试、逻辑推演以及智能体规划等任务，并在知识、推理、编程、对齐和多语言评测上表现出色。与以往的Qwen3模型相比，该版本在长链式思维下的稳定性与推理阶段的高效扩展上更具优势，且在复杂指令跟随时显著降低了重复或偏离任务的情况。","max_tokens":32768,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1428,"completion":14285,"cache":0,"image":0,"request":0},"id":"ali/qwen3-next-80b-a3b-thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Tencent","name":"Hunyuan-Turbos-Vision","api_name":"tencent/hunyuan-turbos-vision","description":"采用混元MOE结构，是混元最新多模态模型，支持多语种作答，中英文能力均衡。","max_tokens":8000,"context_window":32000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":4285,"completion":12857,"cache":0,"image":0,"request":0},"id":"tencent/hunyuan-turbos-vision","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-5-Mini","api_name":"openai/gpt-5-mini","description":"GPT-5-Mini是GPT-5的轻量级版本，专为高效处理日常推理任务而设计。它继承了 GPT-5的指令跟随能力和安全优化，同时具备更低的延迟和成本优势。","max_tokens":128000,"context_window":400000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2500,"completion":20000,"cache":250,"image":0,"request":0},"id":"openai/gpt-5-mini","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"ByteDance","name":"Doubao-Seed-1.6-flash","api_name":"bytedance/doubao-seed-1.6-flash","description":" Doubao-Seed-1.6-flash为豆包大模型1.6系列的极速版本，具有低延迟的显著优势，非常适用于对延迟要求极高的实时交互场景，如客服场景等。","max_tokens":16000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":214,"completion":2142,"cache":42,"image":0,"request":0},"id":"bytedance/doubao-seed-1.6-flash","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Sonnet 4","api_name":"anthropic/claude-sonnet-4","description":"Claude Sonnet 4是对Sonnet 3.7的全面升级，在编程与推理任务中展现出更高的精度与可控性。该模型兼顾能力与计算效率，适用于从日常编码到复杂软件开发等多种场景。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-sonnet-4","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"ByteDance","name":"Doubao-Seed-1.6-thinking","api_name":"bytedance/doubao-seed-1.6:thinking","description":"Doubao-Seed-1.6-thinking，是豆包大模型1.6系列在深度思考方面的强化版本。在代码编写、数学运算、逻辑推理等基础能力上有了进一步的显著提升。","max_tokens":32000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":11428,"cache":228,"image":0,"request":0},"id":"bytedance/doubao-seed-1.6:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"xAI","name":"Grok Code Fast 1","api_name":"xai/grok-code-fast-1","description":"Grok Code Fast 1是xAI的首款编程模型，是一款兼具速度、高性价比的推理模型。","max_tokens":10000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2000,"completion":15000,"cache":200,"image":0,"request":0},"id":"xai/grok-code-fast-1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Opus 4","api_name":"anthropic/claude-opus-4","description":"Claude Opus 4是Anthropic专为构建复杂的人工智能代理而设计的模型，能够在最少的监督下自主推理、规划和执行复杂的任务。该模型在需要扩展上下文、深度推理和自适应执行的软件开发场景中表现卓越。","max_tokens":32000,"context_window":200000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":150000,"completion":750000,"cache":15000,"image":0,"request":0},"id":"anthropic/claude-opus-4","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude Opus 4.1","api_name":"anthropic/claude-opus-4.1","description":"Claude Opus 4.1是Anthropic旗舰模型的更新版本，在编码、推理和代理任务方面均有提升。它在SWE-bench Verified测试中达到了74.5%的准确率，并在多文件代码重构、调试精度和面向细节的推理方面展现出显著提升。","max_tokens":32000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":150000,"completion":750000,"cache":15000,"image":0,"request":0},"id":"anthropic/claude-opus-4.1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":" DeepSeek V3.1 Think","api_name":"deepseek/deepseek-v3.1-think","description":"DeepSeek-V3.1-Think采用混合推理架构，支持思考与非思考双模式运行，具备更高的思考效率，相比 DeepSeek-R1-0528 能够更快得出答案，并通过Post-Training优化显著增强了工具调用与智能体任务的处理能力。","max_tokens":0,"context_window":0,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":0,"completion":0,"cache":0,"image":0,"request":0},"id":"deepseek/deepseek-v3.1-think","support_apis":null},{"company":"ByteDance","name":"Doubao-Seed-1.6","api_name":"bytedance/doubao-seed-1.6","description":"Doubao-Seed-1.6，这是一个 “All-in-One” 的综合模型，也是国内首个支持256K上下文的思考模型，它能够同时支持thinking、non-thinking、auto三种思考模式。这是一个 “All-in-One” 的综合模型，也是国内首个支持256K上下文的思考模型，它能够同时支持thinking、non-thinking、auto三种思考模式。","max_tokens":32000,"context_window":256000,"supports_prompt_cache":true,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":11428,"cache":228,"image":0,"request":0},"id":"bytedance/doubao-seed-1.6","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-Coder-480B-A35B-Instruct","api_name":"ali/qwen3-coder-480b-a35b-instruct","description":"基于Qwen3的代码生成模型，具有强大的Coding Agent能力，代码能力达到开源模型 SOTA。","max_tokens":65536,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":8571,"completion":34285,"cache":0,"image":0,"request":0},"id":"ali/qwen3-coder-480b-a35b-instruct","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":"DeepSeek V3.1","api_name":"deepseek/deepseek-v3.1","description":"DeepSeek V3.1是DeepSeek的新一代模型，具备混合推理（Think \u0026 Non-Think双模式），显著提升思考速度和任务处理效率，并通过后训练强化工具调用与多步骤智能体能力。","max_tokens":64000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":17142,"cache":2285,"image":0,"request":0},"id":"deepseek/deepseek-v3.1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.5","api_name":"bigmodel/glm-4.5","description":"GLM-4.5是智谱最新旗舰模型，推理、代码、智能体综合能力达到开源模型的SOTA水平，模型上下文长度可达128k。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":571,"image":0,"request":0},"id":"bigmodel/glm-4.5","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"o3","api_name":"openai/o3","description":"o3是一款全面且强大的跨领域模型，在数学、科学、编程和视觉推理任务中树立了新标杆。它在技术写作和遵循指令方面也表现出色。可用于处理涉及文本、代码和图像分析的多步骤问题。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":20000,"completion":80000,"cache":5000,"image":0,"request":0},"id":"openai/o3","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Moonshot","name":"Kimi-thinking-preview","api_name":"moonshot/kimi-thinking-preview","description":"Kimi-thinking-preview是月之暗面提供的具有多模态推理能力和通用推理能力的多模态思考模型，它最长支持128k上下文，擅长深度推理，帮助解决更多更难的事情。","max_tokens":128000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":285714,"completion":285714,"cache":0,"image":0,"request":0},"id":"moonshot/kimi-thinking-preview","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Google","name":"Gemini 2.5 Flash Lite","api_name":"google/gemini-2.5-flash-lite","description":"Gemini 2.5 Flash-Lite 是Gemini 2.5 系列中的一个轻量级推理模型，优化了超低延迟和成本效率。与早期的 Flash 模型相比，它提供了更好的吞吐量、更快的 token 生成以及在常见基准测试中更好的性能。可通过reasoning.max_tokens开启思考并控制思维链长度。","max_tokens":64000,"context_window":1000000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1000,"completion":4000,"cache":250,"image":0,"request":0},"id":"google/gemini-2.5-flash-lite","support_apis":["/v1/chat/completions","/v1/messages","/v1beta/models/*","/v1/models/*"]},{"company":"bigmodel","name":"GLM-4.5 Thinking","api_name":"bigmodel/glm-4.5:thinking","description":"GLM-4.5是智谱最新旗舰模型，推理、代码、智能体综合能力达到开源模型的SOTA水平，模型上下文长度可达128k。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":571,"image":0,"request":0},"id":"bigmodel/glm-4.5:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Tencent","name":"Hunyuan-Turbo-Vision","api_name":"tencent/hunyuan-turbo-vision","description":"2024年发布的多模态旗舰模型，支持多语种作答，中英文能力均衡。","max_tokens":6000,"context_window":6000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":4285,"completion":12857,"cache":0,"image":0,"request":0},"id":"tencent/hunyuan-turbo-vision","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude-3.7-sonnet（Thinking）","api_name":"anthropic/claude-3.7-sonnet:thinking","description":"Claude-3.7-sonnet（Thinking）是Anthropic于 2025 年推出的新一代思维型大语言模型。该模型首次融合自回归生成与符号推理架构，具备多步骤思维展示、自我纠错能力以及长达 10 万 tokens 的上下文理解能力，标志着AI从简单生成向深度逻辑推理与透明思考的跨越。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-3.7-sonnet:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude-3.7-sonnet","api_name":"anthropic/claude-3.7-sonnet","description":"Claude 3.7 Sonnet是首个提供扩展思考功能的Claude模型，可通过仔细、逐步的推理解决复杂问题。","max_tokens":64000,"context_window":200000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":3000,"image":0,"request":0},"id":"anthropic/claude-3.7-sonnet","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Tencent","name":"Hunyuan-T1-Vision","api_name":"tencent/hunyuan-t1-vision","description":"采用混元MOE结构，是混元最新多模态模型，支持多语种作答，中英文能力均衡。","max_tokens":8000,"context_window":32000,"supports_prompt_cache":true,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":4285,"completion":12857,"cache":0,"image":0,"request":0},"id":"tencent/hunyuan-t1-vision","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":"DeepSeek-R1-0528","api_name":"deepseek/deepseek-r1-0528","description":"DeepSeek R1-0528是DeepSeek R1模型的小版本升级版。通过增加计算资源和引入后训练阶段的算法优化，DeepSeek R1-0528显著提升了推理与推断的深度和能力。该模型在数学、编程及一般逻辑等多项基准测试中表现优异，整体性能已接近业内领先的 O3和Gemini 2.5 Pro模型，展现出强大的多领域应用潜力。","max_tokens":48000,"context_window":96000,"supports_prompt_cache":true,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":22857,"cache":0,"image":0,"request":0},"id":"deepseek/deepseek-r1-0528","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.5-Air Thinking","api_name":"bigmodel/glm-4.5-air:thinking","description":"GLM-4.5-Air为GLM-4.5的轻量版，兼顾性能与性价比，可灵活切换混合思考模型。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":2857,"cache":228,"image":0,"request":0},"id":"bigmodel/glm-4.5-air:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.5-X","api_name":"bigmodel/glm-4.5-x","description":"GLM-4.5-X为GLM-4.5的极速版，在性能强劲的同时，生成速度可达100tokens/秒。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":11428,"completion":22857,"cache":2285,"image":0,"request":0},"id":"bigmodel/glm-4.5-x","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.5-Air","api_name":"bigmodel/glm-4.5-air","description":"GLM-4.5-Air为GLM-4.5的轻量版，兼顾性能与性价比，可灵活切换混合思考模型。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":2857,"cache":228,"image":0,"request":0},"id":"bigmodel/glm-4.5-air","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"MiniMax","name":"MiniMax-M1","api_name":"minimax/minimax-m1","description":"MiniMax-M1，世界上第一个开源的大规模混合架构的推理模型，适合在复杂场景中使用。","max_tokens":80000,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":11428,"cache":0,"image":0,"request":0},"id":"minimax/minimax-m1","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-235B-A22B-Instruct-2507","api_name":"ali/qwen3-235b-a22b-instruct-2507","description":"Qwen3-235B-A22B-Instruct-2507是Qwen3-235B-A22B-FP8的增强版本，在指令遵循、逻辑推理、文本理解、数学与编程能力、多语言知识覆盖等方面表现显著提升，并在256K长上下文处理和主观任务对齐度上进一步优化，生成内容更高质量、更贴近用户意图。","max_tokens":32768,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":1142,"image":0,"request":0},"id":"ali/qwen3-235b-a22b-instruct-2507","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.5-X Thinking","api_name":"bigmodel/glm-4.5-x:thinking","description":"GLM-4.5-X为GLM-4.5的极速版，在性能强劲的同时，生成速度可达100tokens/秒。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":11428,"completion":22857,"cache":2285,"image":0,"request":0},"id":"bigmodel/glm-4.5-x:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4.5-AirX Thinking","api_name":"bigmodel/glm-4.5-airx:thinking","description":"GLM-4.5-AirX为GLM-4.5-Air的极速版，响应速度更快，专为大规模高速度需求打造。","max_tokens":9600,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":17142,"cache":1142,"image":0,"request":0},"id":"bigmodel/glm-4.5-airx:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-235B-A22B-Thinking-2507","api_name":"ali/qwen3-235b-a22b-thinking-2507","description":"基于Qwen3的思考模式开源模型，相较通义千问3-235B-A22B逻辑能力、通用能力、知识增强及创作能力均有大幅提升，适用于高难度强推理场景。","max_tokens":32768,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":28571,"cache":0,"image":0,"request":0},"id":"ali/qwen3-235b-a22b-thinking-2507","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Moonshot","name":"Kimi-latest","api_name":"moonshot/kimi-latest","description":"Kimi-latest是一个最长支持128k上下文的视觉模型，支持图片理解。同时，kimi-latest 模型总是使用Kimi智能助手产品使用最新的Kimi 大模型版本，可能包含尚未稳定的特性。","max_tokens":128000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":14285,"cache":0,"image":0,"request":0},"id":"moonshot/kimi-latest","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Anthropic","name":"Claude 3.5 Haiku","api_name":"anthropic/claude-3.5-haiku","description":"Claude 3.5 Haiku是Anthropic最快且最具成本效益的下一代模型，非常适合速度和经济性重要的应用场景。","max_tokens":8000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":8000,"completion":40000,"cache":800,"image":0,"request":0},"id":"anthropic/claude-3.5-haiku","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Google","name":"Gemini 2.5 Flash Nativa Audio ","api_name":"google/gemini-2.5-flash-live","description":"Gemini 2.5 Flash 的原生音频模型，支持Live API格式调用。","max_tokens":65535,"context_window":1048576,"supports_prompt_cache":true,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":5000,"completion":20000,"cache":5000,"image":0,"request":0},"id":"google/gemini-2.5-flash-live","support_apis":["/ws/gemini-live"]},{"company":"bigmodel","name":"GLM-4.5-AirX","api_name":"bigmodel/glm-4.5-airx","description":"GLM-4.5-AirX为GLM-4.5-Air的极速版，响应速度更快，专为大规模高速度需求打造。","max_tokens":96000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":5714,"completion":17142,"cache":1142,"image":0,"request":0},"id":"bigmodel/glm-4.5-airx","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"baidu","name":"ERNIE-4.0-Turbo-128K","api_name":"baidu/ernie-4.0-turbo-128k","description":"百度自研的旗舰级超大规模⼤语⾔模型，综合效果表现出色，广泛适用于各领域复杂任务场景；支持自动对接百度搜索插件，保障问答信息时效。相较于ERNIE 4.0在性能表现上更优秀","max_tokens":4000,"context_window":124000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":4285,"completion":12857,"cache":0,"image":0,"request":0},"id":"baidu/ernie-4.0-turbo-128k","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen-Plus","api_name":"ali/qwen-plus-latest","description":"通义千问系列能力均衡的模型，推理效果、成本和速度介于通义千问-Max和通义千问-Turbo之间，适合中等复杂任务。","max_tokens":16384,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":2857,"cache":457,"image":0,"request":0},"id":"ali/qwen-plus-latest","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"Ali","name":"Qwen-Turbo (Thinking)","api_name":"ali/qwen-turbo-latest:thinking","description":"Qwen-Turbo基于Qwen2.5开发，具有速度快、成本低的特点，适用于处理简单任务。","max_tokens":16384,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":428,"completion":4285,"cache":171,"image":0,"request":0},"id":"ali/qwen-turbo-latest:thinking","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen-Turbo","api_name":"ali/qwen-turbo-latest","description":"Qwen-Turbo基于Qwen2.5开发，是一款支持100万token上下文长度的大模型，具有速度快、成本低的特点，适用于处理简单任务。","max_tokens":16384,"context_window":1000000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":428,"completion":857,"cache":171,"image":0,"request":0},"id":"ali/qwen-turbo-latest","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen-Plus (Thinking)","api_name":"ali/qwen-plus-latest:thinking","description":"Qwen-Plus 基于 Qwen2.5 基础模型构建，是一款支持 131K 上下文长度的大模型，在性能、速度与成本之间实现了良好平衡。","max_tokens":16384,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":11428,"cache":457,"image":0,"request":0},"id":"ali/qwen-plus-latest:thinking","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"xAI","name":"Grok 3","api_name":"x-ai/grok-3","description":"Grok 3是Grok的第三代版本，在数据提取、编码和文本摘要等企业用例中表现出色。","max_tokens":131072,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":30000,"completion":150000,"cache":7500,"image":0,"request":0},"id":"x-ai/grok-3","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"baidu","name":" ERNIE-4.5-Turbo-128K","api_name":"baidu/ernie-4.5-turbo-128k","description":"模型能力全面提升，更好满足多轮长历史对话处理、长文档理解问答任务。","max_tokens":123000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":4571,"cache":285,"image":0,"request":0},"id":"baidu/ernie-4.5-turbo-128k","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"DeepSeek","name":"DeepSeek-V3:latest","api_name":"deepseek/deepseek-v3","description":"DeepSeek-V3是DeepSeek公司发布的一款大型语言模型，以其在数学、编码和中文等任务上的卓越性能而闻名。该模型采用了MoE架构，拥有6710亿参数，每个token激活的参数量为370亿，并且支持128K token的超长上下文窗口。DeepSeek-V3不仅在性能上对标GPT-4o等主流模型，还在推理速度和效率上取得了显著提升。﻿","max_tokens":16000,"context_window":128000,"supports_prompt_cache":true,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":11428,"cache":0,"image":0,"request":0},"id":"deepseek/deepseek-v3","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT OSS 20B","api_name":"openai/gpt-oss-20b","description":"GPT OSS 20B是OpenAI发布的开放权重210亿参数模型，它采用MoE架构，每次前向传递有36亿个有效参数，并针对低延迟推理和在消费级或单GPU硬件上的部署进行了优化。该模型采用OpenAI的Harmony响应格式进行训练，并支持推理级别配置、微调和代理功能，包括函数调用、工具使用和结构化输出。","max_tokens":131072,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":700,"completion":3000,"cache":0,"image":0,"request":0},"id":"openai/gpt-oss-20b","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"baidu","name":"ERNIE-4.5-Turbo-128K-Preview","api_name":"baidu/ernie-4.5-turbo-preview","description":"模型能力全面提升，更好满足多轮长历史对话处理、长文档理解问答任务。","max_tokens":123000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":4571,"cache":285,"image":0,"request":0},"id":"baidu/ernie-4.5-turbo-preview","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT OSS 120B","api_name":"openai/gpt-oss-120b","description":"GPT OSS 120B是OpenAI推出的开放权重、包含1170亿个参数的MoE语言模型，专为高推理、代理和通用生产用例而设计。它每次前向传递可激活51亿个参数，并经过优化，可在具有原生MXFP4量化的单个H100 GPU上运行。","max_tokens":131072,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1000,"completion":5000,"cache":0,"image":0,"request":0},"id":"openai/gpt-oss-120b","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"baidu","name":"ERNIE-4.5-Turbo-32K","api_name":"baidu/ernie-4.5-turbo-32k","description":"文本创作、知识问答等能力提升显著。输出长度及整句时延相较ERNIE 4.5有所增加。","max_tokens":27000,"context_window":32000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":4571,"cache":285,"image":0,"request":0},"id":"baidu/ernie-4.5-turbo-32k","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"baidu","name":"ERNIE-4.5-Turbo-VL-Preview","api_name":"baidu/ernie-4.5-turbo-vl-preview","description":"文心一言大模型全新版本，图片理解、创作、翻译、代码等能力显著提升，首次支持32K上下文长度，首Token时延显著降低。","max_tokens":123000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":4285,"completion":12857,"cache":1071,"image":0,"request":0},"id":"baidu/ernie-4.5-turbo-vl-preview","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Meta.AI","name":"Llama 3.3 70B Instruct","api_name":"meta/llama-3.3-70b-instruct","description":"Meta Llama 3.3 是一款多语言大语言模型，在 70B 参数规模下进行了预训练与指令微调，支持文本输入与输出。该模型专为多语言对话场景优化，具备强大的生成与理解能力。","max_tokens":16000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":1200,"completion":3000,"cache":0,"image":0,"request":0},"id":"meta/llama-3.3-70b-instruct","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-4o-2024-11-20","api_name":"openai/gpt-4o-2024-11-20","description":"GPT 4o是OpenAI智能水平最高、通用性最强的旗舰模型。它支持文本与图像输入，并生成文本输出（包括结构化结果），在大多数任务中表现出色，是目前除o系列专用模型外能力最强的通用模型。","max_tokens":16384,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+image+video","output":"text","tokenizer":""},"pricing":{"prompt":50000,"completion":150000,"cache":12500,"image":0,"request":0},"id":"openai/gpt-4o-2024-11-20","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-4o-mini","api_name":"openai/gpt-4o-mini","description":"GPT-4o-mini是一款快速、经济的小型模型，适用于聚焦型任务和微调场景。它支持文本和图像输入，并生成文本输出，在成本和响应速度方面具有显著优势。","max_tokens":16384,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text+audio+image","output":"text+audio+image","tokenizer":""},"pricing":{"prompt":857,"completion":3428,"cache":428,"image":0,"request":0},"id":"openai/gpt-4o-mini","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"ByteDance","name":"Doubao-1.5-pro-256K","api_name":"bytedance/doubao-pro-256k","description":"Doubao-1.5-pro使用MoE 架构，并通过训练-推理一体化设计，实现模型性能和推理性能之间的极致平衡。","max_tokens":12000,"context_window":256000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":7142,"completion":12857,"cache":0,"image":0,"request":0},"id":"bytedance/doubao-pro-256k","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen-VL-Plus","api_name":"ali/qwen-vl-plus","description":"通义千问大规模视觉语言模型增强版。大幅提升细节识别能力和文字识别能力，支持超百万像素分辨率和任意长宽比规格的图像。","max_tokens":8192,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text+image","output":"text","tokenizer":""},"pricing":{"prompt":1142,"completion":2857,"cache":228,"image":0,"request":0},"id":"ali/qwen-vl-plus","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"QvQ-72b","api_name":"ali/qvq-72b","description":"QVQ-72B是阿里云通义千问团队开发的多模态推理模型，拥有720亿参数，具备强大的视觉理解和推理能力。该模型在解决数学、物理、科学等领域的复杂推理问题上表现突出，能够处理需要同时理解文本和图像的任务。","max_tokens":8192,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":17142,"completion":51428,"cache":0,"image":0,"request":0},"id":"ali/qvq-72b","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"o3-mini-high","api_name":"openai/o3-mini-high","description":"o3-mini-high是 o3-mini 模型在高推理强度设置下的版本，特别擅长STEM领域的推理任务，包括科学、数学和编程。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":11000,"completion":44000,"cache":5500,"image":0,"request":0},"id":"openai/o3-mini-high","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"o3-mini","api_name":"openai/o3-mini","description":"OpenAI o3-mini是OpenAI推出的小型推理模型，在保持与o1-mini 相同的成本与延迟水平下，提供了更高的智能表现。该模型专为复杂推理任务优化，特别适用于科学、数学与编程等领域。同时，o3-mini支持结构化输出、函数调用、Batch API等关键开发功能，为构建高效智能应用提供了强大支持。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text+image+audio","output":"text+image+audio","tokenizer":""},"pricing":{"prompt":11000,"completion":44000,"cache":5500,"image":0,"request":0},"id":"openai/o3-mini","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"o1","api_name":"openai/o1","description":"o1模型，具有强大的推理能力，在科学、编程、数学等领域表现出色。它采用了长思维链和自适应计算，能够处理复杂推理任务。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":150000,"completion":600000,"cache":75000,"image":0,"request":0},"id":"openai/o1","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"OpenAI","name":"GPT-4.1","api_name":"openai/gpt-4.1","description":"GPT-4.1是OpenAI推出的一款面向复杂任务的旗舰模型，具备跨领域的问题解决能力，能够高效应对各种复杂场景和挑战。","max_tokens":32768,"context_window":1047576,"supports_prompt_cache":false,"architecture":{"input":"text+image+audio+video","output":"text+image+audio+video","tokenizer":""},"pricing":{"prompt":20000,"completion":80000,"cache":5000,"image":0,"request":0},"id":"openai/gpt-4.1","support_apis":["/v1/chat/completions","/v1/messages","/v1/responses"]},{"company":"OpenAI","name":"GPT-4.1-mini","api_name":"openai/gpt-4.1-mini","description":"GPT-4.1-mini是GPT-4.1的小型版本，兼顾文本生成与理解，适合用于语言教学与分析任务。","max_tokens":32768,"context_window":1047576,"supports_prompt_cache":false,"architecture":{"input":"text+image+audio+video","output":"text+audio+image+video","tokenizer":""},"pricing":{"prompt":4000,"completion":16000,"cache":2000,"image":0,"request":0},"id":"openai/gpt-4.1-mini","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"GPT-4.1-nano","api_name":"openai/gpt-4.1-nano","description":"GPT-4.1-nano是GPT-4.1系列中速度最快且性价比最高的模型，专为高效、低成本的应用场景设计。","max_tokens":32768,"context_window":1047576,"supports_prompt_cache":false,"architecture":{"input":"text+audio+image+video","output":"text+audio+image+video","tokenizer":""},"pricing":{"prompt":1000,"completion":4000,"cache":250,"image":0,"request":0},"id":"openai/gpt-4.1-nano","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel.cn","name":"​​GLM-Z1-AirX","api_name":"bigmodel/glm-z1-airx","description":"快速推理模型，推理速度可达 200 tokens/秒，比常规快8倍。","max_tokens":4095,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":7142,"completion":7142,"cache":0,"image":0,"request":0},"id":"bigmodel/glm-z1-airx","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-4-plus","api_name":"bigmodel/glm-4-plus","description":"GLM-4-Plus是GLM团队发布的基座大模型，使用了大量模型辅助构造高质量合成数据以提升模型性能，利用PPO有效提升模型推理（数学、代码算法题等）表现，更好反映人类偏好。","max_tokens":4095,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":7142,"completion":7142,"cache":0,"image":0,"request":0},"id":"bigmodel/glm-4-plus","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"bigmodel","name":"GLM-Z1-Air","api_name":"bigmodel/glm-z1-air","description":"专为数理与逻辑推理优化的模型，在对齐阶段深度优化了通用能力，具备更强的泛化能力与速度表现。","max_tokens":0,"context_window":0,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":0,"completion":0,"cache":0,"image":0,"request":0},"id":"bigmodel/glm-z1-air","support_apis":null},{"company":"OpenAI","name":"o4-mini","api_name":"openai/o4-mini","description":"OpenAI o4-mini 是 o 系列中的一款紧凑型推理模型，经过优化，能够在保持强大多模态和代理能力的同时，实现快速且成本效益高的性能。它支持工具使用，并在 AIME（使用 Python 达到 99.5% 准确率）和 SWE-bench 等基准测试中表现出色，超越了前代 o3-mini，甚至在某些领域接近 o3 的水平。  尽管体积较小，o4-mini 在 STEM 任务、视觉问题解决（如 MathVista、MMMU）和代码编辑方面表现出高准确率。它特别适用于对延迟或成本敏感的高吞吐量场景。得益于其高效的架构和精细的强化学习训练，o4-mini 能够串联使用工具、生成结构化输出，并以最小延迟解决多步骤任务——通常在一分钟内完成。  此外，o4-mini 与 o3 一样，具备图像推理能力，能够将视觉输入（如草图和白板）直接整合到其思维过程中，并在分析中对图像进行调整，如缩放或旋转。这些增强功能现已向 ChatGPT Plus、Pro 和 Team 用户开放。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":11000,"completion":44000,"cache":2750,"image":0,"request":0},"id":"openai/o4-mini","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"O3 Deep Research","api_name":"openai/o3-deep-research","description":"（当前仅支持/v1/responses接口）OpenAI O3 Deep Reserch模型。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"","output":"","tokenizer":""},"pricing":{"prompt":100000,"completion":400000,"cache":25000,"image":0,"request":0},"id":"openai/o3-deep-research","support_apis":["/v1/responses"]},{"company":"OpenAI","name":"o4-mini-high","api_name":"openai/o4-mini-high","description":"OpenAI o4-mini-high 与 o4-mini 是同一个模型，只是将 推理效果设为高。o4 是 o 系列中的一个紧凑型推理模型，优化目标是实现快速、成本效益高的性能，同时保持强大的多模态能力和自主代理（agentic）能力。它支持工具调用，在多个基准测试中展现出强劲的推理和编程表现，例如在 AIME 中使用 Python 达到 99.5% 的成绩，在 SWE-bench 中也优于其前代 o3-mini，甚至在某些领域接近 o3 模型的表现。尽管体积更小，o4-mini 在 STEM 任务（科学、技术、工程、数学）、视觉问题解决（如 MathVista、MMMU）以及代码编辑方面依然表现出色。它特别适用于对延迟和成本要求较高的大吞吐场景。得益于其高效的架构设计和精细化的强化学习训练，o4-mini 能够链式调用工具、生成结构化输出，并在不到一分钟内完成多步复杂任务。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":11000,"completion":44000,"cache":2750,"image":0,"request":0},"id":"openai/o4-mini-high","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Meta.AI","name":"Llama 4 Scout","api_name":"meta/llama-4-scout","description":"Llama 4 Scout适用于长上下文中的检索任务，以及需要对大量信息进行推理处理的任务，例如总结多个大型文档、分析大量用户互动日志以实现个性化，以及跨大型代码库进行推理。","max_tokens":8000,"context_window":128000,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":800,"completion":3000,"cache":0,"image":0,"request":0},"id":"meta/llama-4-scout","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"Ali","name":"Qwen3-235B-A22B","api_name":"ali/qwen3-235b-a22b","description":"Qwen3-235B-A22B 是通义千问研发的一款拥有 2350 亿参数的MoE模型，每次前向传播仅激活 220 亿参数。该模型支持在 “思考” 模式与 “非思考” 模式间无缝切换 —— 前者适用于复杂推理、数学及代码任务，后者则能提升日常对话的效率。它具备强大的推理能力，支持 100 多种语言及方言的多语种处理，在指令遵循和智能体工具调用方面表现出色，原生支持 32Ktoken 的上下文窗口，并可通过基于 YaRN 的扩展技术将上下文长度提升至 131Ktoken。","max_tokens":16384,"context_window":131072,"supports_prompt_cache":false,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":2857,"completion":28571,"cache":0,"image":0,"request":0},"id":"ali/qwen3-235b-a22b","support_apis":["/v1/chat/completions","/v1/messages"]},{"company":"OpenAI","name":"Codex-mini-latest","api_name":"openai/codex-mini-latest","description":"codex-mini-latest 是 o4-mini 的微调版本，专门用于 Codex CLI。对于直接在 API 中使用，我们建议从 gpt-4.1 开始。","max_tokens":100000,"context_window":200000,"supports_prompt_cache":true,"architecture":{"input":"text","output":"text","tokenizer":""},"pricing":{"prompt":15000,"completion":60000,"cache":3750,"image":0,"request":0},"id":"openai/codex-mini-latest","support_apis":["/v1/chat/completions","/v1/messages"]}],"object":"list","success":true}