2026 全球海外大模型 API 羊毛全攻略
Free LLM API resources
本列表收录了各种提供免费访问或赠送 API 额度的大型语言模型服务。
- 免费提供商OpenRouterGoogle AI StudioNVIDIA NIMMistral (La Plateforme)Mistral (Codestral)HuggingFace Inference ProvidersVercel AI GatewayOpenCode ZenCerebrasGroqCohereGitHub ModelsCloudflare Workers AI
- 提供试用额度的提供商FireworksBasetenNebiusNovitaAI21UpstageNLP CloudAlibaba Cloud (International) Model StudioModalInference.netHyperbolicSambaNova CloudScaleway Generative APIs
免费提供商
OpenRouter
限制:
20 requests/minute<br>50 requests/day<br>终身充值 10 美元后每天最多 1000 次请求
各模型共享公共配额。
- Gemma 3 12B Instruct
- Gemma 3 27B Instruct
- Gemma 3 4B Instruct
- Hermes 3 Llama 3.1 405B
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- baidu/qianfan-ocr-fast:free
- cognitivecomputations/dolphin-mistral-24b-venice-edition:free
- google/gemma-3n-e2b-it:free
- google/gemma-3n-e4b-it:free
- google/gemma-4-26b-a4b-it:free
- google/gemma-4-31b-it:free
- inclusionai/ling-2.6-1t:free
- liquid/lfm-2.5-1.2b-instruct:free
- liquid/lfm-2.5-1.2b-thinking:free
- minimax/minimax-m2.5:free
- nvidia/nemotron-3-nano-30b-a3b:free
- nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
- nvidia/nemotron-3-super-120b-a12b:free
- nvidia/nemotron-nano-12b-v2-vl:free
- nvidia/nemotron-nano-9b-v2:free
- openai/gpt-oss-120b:free
- openai/gpt-oss-20b:free
- poolside/laguna-m.1:free
- poolside/laguna-xs.2:free
- qwen/qwen3-coder:free
- qwen/qwen3-next-80b-a3b-instruct:free
- tencent/hy3-preview:free
- z-ai/glm-4.5-air:free
Google AI Studio
在英国/瑞士/欧洲经济区/欧盟以外地区使用时,数据会被用于训练。
<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody><tr><td>Gemini 3 Flash</td><td>250,000 tokens/minute<br>20 requests/day<br>5 requests/minute</td></tr><tr><td>Gemini 3.1 Flash-Lite</td><td>250,000 tokens/minute<br>500 requests/day<br>15 requests/minute</td></tr><tr><td>Gemini 2.5 Flash</td><td>250,000 tokens/minute<br>20 requests/day<br>5 requests/minute</td></tr><tr><td>Gemini 2.5 Flash-Lite</td><td>250,000 tokens/minute<br>20 requests/day<br>10 requests/minute</td></tr><tr><td>Gemini 3.1 Flash TTS</td><td>10,000 tokens/minute<br>10 requests/day<br>3 requests/minute</td></tr><tr><td>Gemini 2.5 Flash TTS</td><td>10,000 tokens/minute<br>10 requests/day<br>3 requests/minute</td></tr><tr><td>Gemini Robotics-ER 1.6</td><td>250,000 tokens/minute<br>20 requests/day<br>5 requests/minute</td></tr><tr><td>Gemini Robotics-ER 1.5</td><td>250,000 tokens/minute<br>20 requests/day<br>10 requests/minute</td></tr><tr><td>Gemma 3 27B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr><tr><td>Gemma 3 12B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr><tr><td>Gemma 3 4B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr><tr><td>Gemma 3 1B Instruct</td><td>15,000 tokens/minute<br>14,400 requests/day<br>30 requests/minute</td></tr></tbody></table>
NVIDIA NIM
需要验证手机号码。模型的上下文窗口通常受限。
限制: 40 requests/minute
Mistral (La Plateforme)
- 免费计划(Experiment plan)需要同意将数据用于训练
- 需要验证手机号码。
限制(每个模型): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month
Mistral (Codestral)
- 目前免费使用
- 基于月度订阅
- 需要验证手机号码
限制: 30 requests/minute, 2,000 requests/day
- Codestral
HuggingFace Inference Providers
HuggingFace Serverless Inference 仅限于小于 10GB 的模型。一些热门模型即使超过 10GB 也受支持。
- 各受支持提供商的各种开源模型
Vercel AI Gateway
路由至各种受支持的提供商。
限制:$5/month
OpenCode Zen
包含精选模型的 AI 网关。
免费模型可能会使用数据进行改进。
- Big Pickle Stealth
- MiniMax M2.5 Free
- Arcee Large Preview Free
Cerebras
<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody><tr><td>gpt-oss-120b</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr><tr><td>Llama 3.1 8B</td><td>30 requests/minute<br>60,000 tokens/minute<br>900 requests/hour<br>1,000,000 tokens/hour<br>14,400 requests/day<br>1,000,000 tokens/day</td></tr></tbody></table>
Groq
<table><thead><tr><th>Model Name</th><th>Model Limits</th></tr></thead><tbody><tr><td>Allam 2 7B</td><td>7,000 requests/day<br>6,000 tokens/minute</td></tr><tr><td>Llama 3.1 8B</td><td>14,400 requests/day<br>6,000 tokens/minute</td></tr><tr><td>Llama 3.3 70B</td><td>1,000 requests/day<br>12,000 tokens/minute</td></tr><tr><td>Llama 4 Scout Instruct</td><td>1,000 requests/day<br>30,000 tokens/minute</td></tr><tr><td>Whisper Large v3</td><td>7,200 audio-seconds/minute<br>2,000 requests/day</td></tr><tr><td>Whisper Large v3 Turbo</td><td>7,200 audio-seconds/minute<br>2,000 requests/day</td></tr><tr><td>canopylabs/orpheus-arabic-saudi</td><td></td></tr><tr><td>canopylabs/orpheus-v1-english</td><td></td></tr><tr><td>groq/compound</td><td>250 requests/day<br>70,000 tokens/minute</td></tr><tr><td>groq/compound-mini</td><td>250 requests/day<br>70,000 tokens/minute</td></tr><tr><td>meta-llama/llama-prompt-guard-2-22m</td><td></td></tr><tr><td>meta-llama/llama-prompt-guard-2-86m</td><td></td></tr><tr><td>openai/gpt-oss-120b</td><td>1,000 requests/day<br>8,000 tokens/minute</td></tr><tr><td>openai/gpt-oss-20b</td><td>1,000 requests/day<br>8,000 tokens/minute</td></tr><tr><td>openai/gpt-oss-safeguard-20b</td><td>1,000 requests/day<br>8,000 tokens/minute</td></tr><tr><td>qwen/qwen3-32b</td><td>1,000 requests/day<br>6,000 tokens/minute</td></tr></tbody></table>
Cohere
限制:
20 requests/minute<br>1,000 requests/month
各模型共享公共的月度配额。
- c4ai-aya-expanse-32b
- c4ai-aya-vision-32b
- command-a-03-2025
- command-a-reasoning-08-2025
- command-a-translate-08-2025
- command-a-vision-07-2025
- command-r-08-2024
- command-r-plus-08-2024
- command-r7b-12-2024
- command-r7b-arabic-02-2025
GitHub Models
极具限制性的输入/输出 Token 限制。
限制:取决于 Copilot 订阅等级 (Free/Pro/Pro+/Business/Enterprise)
- AI21 Jamba 1.5 Large
- Codestral 25.01
- Cohere Command A
- Cohere Command R 08-2024
- Cohere Command R+ 08-2024
- DeepSeek-R1
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Grok 3
- Grok 3 Mini
- Llama 4 Maverick 17B 128E Instruct FP8
- Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Llama-3.3-70B-Instruct
- MAI-DS-R1
- Meta-Llama-3.1-405B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Ministral 3B
- Mistral Medium 3 (25.05)
- Mistral Small 3.1
- OpenAI GPT-4.1
- OpenAI GPT-4.1-mini
- OpenAI GPT-4.1-nano
- OpenAI GPT-4o
- OpenAI GPT-4o mini
- OpenAI Text Embedding 3 (large)
- OpenAI Text Embedding 3 (small)
- OpenAI gpt-5
- OpenAI gpt-5-chat (preview)
- OpenAI gpt-5-mini
- OpenAI gpt-5-nano
- OpenAI o1
- OpenAI o1-mini
- OpenAI o1-preview
- OpenAI o3
- OpenAI o3-mini
- OpenAI o4-mini
- Phi-4
- Phi-4-mini-instruct
- Phi-4-mini-reasoning
- Phi-4-multimodal-instruct
- Phi-4-reasoning
Cloudflare Workers AI
- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/google/gemma-4-26b-a4b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/moonshotai/kimi-k2.5
- @cf/moonshotai/kimi-k2.6
- @cf/nvidia/nemotron-3-120b-a12b
- @cf/openai/gpt-oss-120b
- @cf/openai/gpt-oss-20b
- @cf/qwen/qwen3-30b-a3b-fp8
- @cf/zai-org/glm-4.7-flash
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct
- Gemma 7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 13B Chat (AWQ)
- Llama 2 7B Chat (FP16)
- Llama 2 7B Chat (INT8)
- Llama 2 7B Chat (LoRA)
- Llama 3 8B Instruct
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (FP8)
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
- Mistral 7B Instruct v0.1
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- Qwen 1.5 0.5B Chat
- Qwen 1.5 1.8B Chat
- Qwen 1.5 14B Chat (AWQ)
- Qwen 1.5 7B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)
</tbody></table>
提供试用额度的提供商
Fireworks
试用额度: $1
模型:各种开源模型
Baseten
试用额度: $30
Nebius
试用额度: $1
模型:各种开源模型
Novita
试用额度: $0.5 有效期 1 年
模型:各种开源模型
AI21
试用额度: $10 有效期 3 个月
模型: Jamba 系列模型
Upstage
试用额度: $10 有效期 3 个月
模型: Solar Pro/Mini
NLP Cloud
试用额度: $15
要求: Phone number verification
模型: 各种开源模型
Alibaba Cloud (International) Model Studio
试用额度: 每个模型 100 万 Token
Modal
试用额度: 注册即赠 $5/月,添加支付方式后 $30/月
模型: 任何受支持的模型 - 按计算时间付费
Inference.net
试用额度: $1,回复邮件调查赠送 $25
模型: 各种开源模型
Hyperbolic
试用额度: $1
模型:
- DeepSeek V3 0324
- Llama 3.3 70B Instruct
- deepseek-ai/deepseek-r1-0528
- qwen/qwen3-coder-480b-a35b-instruct
SambaNova Cloud
试用额度: $5 有效期 3 个月
模型:
- Llama 3.3 70B
- Llama-4-Maverick-17B-128E-Instruct
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.2
- google/gemma-3-12b-it
- minimaxai/minimax-m2.5
- openai/gpt-oss-120b
Scaleway Generative APIs
试用额度: 1,000,000 免费 Token
模型:
- BGE-Multilingual-Gemma2
- Gemma 3 27B Instruct
- Llama 3.3 70B Instruct
- Pixtral 12B (2409)
- Whisper Large v3
- devstral-2-123b-instruct-2512
- gpt-oss-120b
- holo2-30b-a3b
- mistral-small-3.2-24b-instruct-2506
- qwen3-235b-a22b-instruct-2507
- qwen3-coder-30b-a3b-instruct
- qwen3-embedding-8b
- qwen3.5-397b-a17b
- voxtral-small-24b-2507