Is Qwen Free? Pricing, Models & API Tiers Explained
Qwen pricing spans three tiers: free on your own hardware, a free hosted tier on Groq, and a paid API starting at $0.15 per million input tokens for the open-weight 35B model. This guide covers every Qwen pricing tier, every access path, and every licensing condition for the 2026 model lineup, with no hand-waving about "competitive pricing."
The Short Answer: Yes, Three Ways
Qwen's access model splits into three tiers: genuinely free (local open-weight models), platform-free (Groq's hosted free tier with rate limits), and paid API access starting at $0.15 per million input tokens for the open-weight 35B model up to $2.50 per million for the frontier-tier Qwen3.7-Max.
Which Qwen pricing tier applies depends on your use case. A developer running Qwen3.6-35B-A3B on an RTX 4090 for local coding assistance pays nothing. A team using Qwen3.7-Max for an enterprise agent pipeline pays $2.50 per million input tokens, still well below comparable frontier models. Both paths use the same underlying model architecture; the cost difference comes from who is running the inference.
Free Access: Local, Groq, and What Happened to OAuth
Local Self-Hosting
Every Qwen model under 35 billion parameters carries an Apache 2.0 license. That means commercial use, fine-tuning, redistribution, and integration into proprietary products, with no royalty or usage fee. The Qwen3.5-397B-A17B model is the notable exception: at 397 billion parameters, it is also Apache 2.0, an unusually permissive choice for a model of this capability.
Practical hardware benchmarks from the 2026 model lineup: Qwen3.6-35B-A3B runs at 20-25 tokens per second on an RTX 4090 with 24GB of VRAM (INT4 quantization). The model activates only 3 billion parameters per token, so inference cost tracks the 3B active count, not the 35B total. For 16GB VRAM, Qwen3.5-9B is the mainstream local choice for coding and RAG workflows.
Groq Free Tier
Groq hosts Qwen3-32B on its free tier. No credit card required. The model covers strong coding and math tasks and supports Qwen's switchable thinking/non-thinking modes. Groq does not publish specific request-per-minute limits for free tier accounts, so treat this path as suitable for prototyping and personal projects rather than production pipelines.
The Qwen OAuth Era Is Over
From Qwen's public launch through early 2026, Alibaba ran a free OAuth tier through the Qwen API. At its peak, it offered 1,000 requests per day. Before the shutdown, Alibaba cut the limit to 100 requests per day. The tier closed entirely on April 15, 2026.
The Qwen OAuth free tier was discontinued on April 15, 2026. Any integration built on that authentication path stopped working on that date. There is no announced replacement. If you need a free hosted option, use the Groq free tier or self-host an open-weight model.
Qwen Pricing: API Tiers From $0.15 to $2.50 Per Million Tokens
The paid Qwen API runs through Alibaba Cloud Model Studio (primary endpoint: Singapore region). Pricing follows a per-token model: one million tokens is roughly 750,000 words, or about 3,000 typical API requests at average message length. Three tiers cover most production use cases.
| Model | Input ($/M tokens) | Output ($/M tokens) | Context | Open-Weight |
|---|---|---|---|---|
| Qwen3.7-Max | $2.50 | $7.50 | 1M tokens | No (API only) |
| Qwen3.6-Max-Preview | $1.30 | $7.80 | 262K tokens | No (API only) |
| Qwen3.6-35B-A3B | $0.15 | $1.00 | 262K native (1M via YaRN) | Yes (Apache 2.0) |
Qwen3.7-Max is the flagship API model: approximately 1 trillion total parameters with roughly 24 billion active per forward pass (Mixture of Experts architecture), a 1 million token context window, and $2.50 per million input tokens. On SWE-Bench Pro (the autonomous software engineering benchmark), it scores 60.6%, which puts it in the top tier of current frontier models at one-sixth the per-token cost of comparable alternatives.
Qwen3.6-35B-A3B is the open-weight API model. At $0.15 per million input tokens, it delivers the cheapest tier in Qwen pricing for hosted inference; the model is the same one available for free local download, so the API version simply provides Alibaba's managed inference at scale. The model is multimodal, supporting image input alongside text. Cache reads drop to $0.05 per million tokens.
AI Governance Charter
Establish your organization's AI principles in one document
Download Free →Prompt Caching: The 90% Discount
Qwen3.7-Max prompt caching cuts your effective input cost by 90% on repeated context prefixes: $0.25/M on cache reads versus $2.50/M on fresh input. You pay once to write the cache at $3.125/M, then recoup that cost across subsequent calls that reuse the same prefix. The crossover breaks even after roughly 2-3 reuses, and the discount compounds as call volume grows.
| Cache Action | Price per Million Tokens | Notes |
|---|---|---|
| Standard input (no cache) | $2.50 | Qwen3.7-Max baseline |
| Cache creation | $3.125 | 25% above standard input (you pay to store) |
| Cache read | $0.25 | 90% below standard input (you save on retrieval) |
| Qwen3.6-35B-A3B cache read | $0.05 | Separate pricing for the open-weight tier |
Caching makes economic sense for: RAG pipelines where system prompts and retrieved context repeat across user turns; customer support agents with long instruction sets that stay constant across sessions; and document analysis workflows that process the same background material repeatedly.
The Qwen3.7-Max cache has a 5-minute time-to-live. If more than 5 minutes pass between requests sharing the same prefix, the cache expires and you pay the cache creation fee again on the next request. Interactive chatbots with frequent turns are well-suited. Batch jobs with long gaps between calls are not.
Third-Party Providers
DeepInfra and OpenRouter are the two most practical alternatives to Alibaba Cloud for Qwen API access. Both are OpenAI-compatible, so the same SDK code works with a base URL swap. Rates differ by model and access type.
| Provider | Models Available | Input / Output (per M) | API Compatibility |
|---|---|---|---|
| DeepInfra | Qwen3.5-397B-A17B | $0.54 / $3.40 | OpenAI-compatible |
| DeepInfra | Qwen3-235B-A22B (thinking) | $0.45 / $3.49 | OpenAI-compatible |
| OpenRouter | Qwen3.7-Max, Qwen3.6-35B-A3B + others | Varies by model | OpenAI-compatible |
| Together AI | Qwen open-weight models | Varies (serverless / dedicated) | OpenAI-compatible |
DeepInfra is the standout third-party option for large open-weight models. Its rate for Qwen3.5-397B-A17B ($0.54/$3.40 per million input/output tokens) undercuts Alibaba Cloud's listed rates for comparable capability tiers. Qwen3-235B-A22B with thinking enabled runs at $0.45/$3.49, useful for math and reasoning tasks where chain-of-thought is required and you want to avoid the Alibaba Cloud billing relationship.
OpenRouter provides a single OpenAI-compatible endpoint that routes to multiple backend providers. You can access qwen/qwen3.7-max or qwen/qwen3.6-35b-a3b without creating an Alibaba Cloud account. OpenRouter bills in credits, which some teams find simpler than managing regional cloud credentials.
Together AI hosts Qwen open-weight models. Rates vary by model and tier (serverless vs. dedicated instances). Dedicated instances carry a fixed hourly cost and make sense only when workload volume is high enough that per-token billing becomes expensive.
One practical note: third-party providers source their models from the open-weight releases. You will not find Qwen3.7-Max or Qwen3.6-Max-Preview on DeepInfra or Together AI: those are proprietary API-only models, available only through Alibaba Cloud. If you need the frontier-tier capability, Alibaba Cloud is the only option.
Enterprise Plans
Standard pay-per-token billing works for most teams. Two situations break that model: development teams that run Qwen agents continuously and need predictable costs, and organizations with data residency requirements that prohibit cloud API calls entirely. Alibaba Cloud has a separate track for each.
Alibaba Cloud Coding Plan
The Coding Plan is a fixed monthly subscription designed for development teams running Qwen continuously through coding agents, IDE integrations, or CI pipelines. Rather than accumulating per-token charges across many small requests, teams pay a predictable monthly fee that covers higher rate limits and priority routing. Alibaba has not published the subscription amount publicly; pricing is disclosed during the enterprise sales process and varies by seat count and commitment term.
The Coding Plan is worth evaluating if your team already uses Qwen Code (the terminal agent) or has integrated Qwen into an IDE extension like VS Code Continue or JetBrains. For intermittent use, pay-per-token billing remains cheaper. The crossover point depends on your average monthly token consumption.
Enterprise Deployment Kit
The Enterprise Deployment Kit provides Docker and Kubernetes deployment configurations for running Qwen models on private infrastructure. This targets industries with strict data residency requirements: finance, defense, healthcare, and high-security enterprise environments where cloud API calls are prohibited by policy or regulation.
With the kit, Qwen runs entirely within your network. No data leaves your perimeter, there are no per-token charges, and you control the hardware. The tradeoff is that you take on the operational cost of running the models yourself: GPU or accelerator provisioning, serving software, and ongoing maintenance. The kit includes configurations for vLLM and other production serving frameworks. It is distributed through Alibaba Cloud's enterprise sales channel, not as a public download.
The Alibaba Cloud Coding Plan subscription fee is not published on the pricing page. Expect a sales conversation before you see a number. If you need a budget estimate, request a quote through the Alibaba Cloud enterprise portal and ask for a consumption-based projection at your expected token volume.
Full Model Lineup (2026)
The Qwen3 model family spans five capability tiers from API-only frontier models to sub-1B edge models. Qwen pricing across these tiers ranges from $0 (open-weight, self-hosted) to $2.50/M input tokens for the frontier API. Understanding active vs. total parameter counts is critical here: Qwen's MoE (Mixture of Experts) models have large total parameter counts but activate only a fraction per request, which is why a "35B" model can run on a 24GB GPU.
| Model | Params | Context | Access | License | Input / Output (per M) |
|---|---|---|---|---|---|
| Qwen3.7-Max | ~1T | 1M tokens | API only | Proprietary | $2.50 / $7.50 |
| Qwen3.6-Max-Preview | ~1T | 262K tokens | API only | Proprietary | $1.30 / $7.80 |
| Qwen3.6-35B-A3B | 35B (3B active) | 262K tokens | Open-weight + API | Apache 2.0 | $0.15 / $1.00 |
| Qwen3.5-397B-A17B | 397B (17B active) | 262K tokens | Open-weight | Apache 2.0 | $0.54 / $3.40 (DeepInfra) |
| Qwen3-235B-A22B | 235B (22B active) | 262K tokens | Open-weight | Tongyi Qianwen | $0.45 / $3.49 (DeepInfra) |
| Qwen3-32B | 32B dense | 262K tokens | Open-weight | Apache 2.0 | Free (Groq) / self-host |
| Qwen3.6-27B | 27B dense | 262K tokens | Open-weight | Apache 2.0 | Self-host |
| Qwen3.5 small (0.8B–9B) | 0.8B / 2B / 4B / 9B | 262K tokens | Open-weight | Apache 2.0 | Self-host (edge/consumer) |
Three patterns to carry forward. The 1M token context window is exclusive to Qwen3.7-Max on the proprietary API; open-weight models top out at 262K tokens natively. The Tongyi Qianwen License applies to Qwen3-235B-A22B and some other larger models, so read the licensing section below before building a commercial product on any Tongyi-licensed model. The small series (0.8B to 9B) is genuinely competitive on edge hardware and benchmarks above similar-size models from other vendors.
Licensing: What You Can Build
Qwen's licensing varies by model tier, and the distinction has real consequences for commercial projects. There are three regimes you need to understand.
Apache 2.0: Full Commercial Freedom
Most Qwen3-generation models at 35B parameters and below ship under Apache 2.0. This includes Qwen3.6-35B-A3B, Qwen3-32B, Qwen3.6-27B, and the much larger Qwen3.5-397B-A17B, which Alibaba released under Apache 2.0 as an unusually permissive exception for a model of that capability. Apache 2.0 permits commercial use, fine-tuning, and redistribution without requiring you to publish your modifications or pay royalties. If you are building a product on an open-weight Qwen model, Apache 2.0 means you can ship it.
Tongyi Qianwen License: Restricted Commercial Use
Some larger models in the Qwen family (including Qwen3-235B-A22B) use the Tongyi Qianwen License. This license permits non-commercial use freely. Commercial use requires a separate agreement with Alibaba Cloud if the product reaches more than 100 million monthly active users. Below that threshold, commercial use is permitted without a separate agreement, but the license terms are more restrictive than Apache 2.0: you cannot relicense the model weights, and certain redistribution conditions apply. Read the full license text before building a commercial product on a Tongyi-licensed model.
Proprietary API: No Weight Access
Qwen3.7-Max and Qwen3.6-Max-Preview are API-only models. Alibaba does not release the weights. You interact through the API under Alibaba Cloud's standard terms of service. There is no licensing decision to make: you are renting compute, not owning a model. This is identical to how comparable proprietary frontier models operate.
The licensing pattern changed between Qwen2.5 and Qwen3.x. Do not assume all Qwen models share the same license. Check the Hugging Face model card for the specific model variant you are deploying. Qwen3-235B-A22B is Tongyi-licensed, not Apache 2.0, despite being an open-weight release.
Tongyi Qianwen License requires a separate Alibaba Cloud agreement for products exceeding 100 million monthly active users. This is not a concern at early product stage, but plan for it before you scale. The agreement process is not instantaneous.
Frequently Asked Questions
Yes, in two ways. Open-weight Qwen models run locally at zero cost under Apache 2.0, no API key, no usage limits. Groq hosts Qwen3-32B on its free tier with no credit card required, though rate limits apply. The Alibaba Cloud API starts at $0.15 per million input tokens for Qwen3.6-35B-A3B. The former OAuth free tier was discontinued April 15, 2026.
Qwen3.7-Max costs $2.50 per million input tokens, roughly 6x cheaper than Claude Opus 4.8 at $15/M. A workflow costing $300 in Opus 4.8 tokens costs approximately $50 in Qwen3.7-Max tokens, with comparable or better performance on coding benchmarks.
Yes. Open-weight Qwen models under Apache 2.0 permit commercial use, fine-tuning, and redistribution. Qwen3.6-35B-A3B runs at 20-25 tokens per second on an RTX 4090 with 24GB VRAM. Qwen3.5-397B-A17B fits in 4-bit quantization on a Mac Studio with 256GB of RAM.
Alibaba discontinued the Qwen OAuth free tier on April 15, 2026. The tier originally allowed 1,000 requests per day, later reduced to 100 requests per day before shutdown. The Groq free tier (Qwen3-32B) remains active and does not require a credit card.
ollama pull qwen3:32b to production vLLM, the complete local deployment guide for self-hosting Qwen on your own hardware.Go Deeper
Resources from across Tech Jacks Solutions
FREEAI Governance Charter
Establish your organization's AI principles in one document
AI Career Paths
Explore roles that work with these tools daily
EU AI Act Guide
Check your compliance obligations under the EU AI Act
FREEAI Risk Management Template
Identify, assess, and mitigate AI deployment risks
FREEAI Bias Assessment
Evaluate bias risks before deploying any AI system