QWEN

Is Qwen Free? Pricing, Models & API Tiers Explained

Qwen pricing spans three tiers: free on your own hardware, a free hosted tier on Groq, and a paid API starting at $0.15 per million input tokens for the open-weight 35B model. This guide covers every Qwen pricing tier, every access path, and every licensing condition for the 2026 model lineup, with no hand-waving about "competitive pricing."

The Short Answer: Yes, Three Ways

Qwen's access model splits into three tiers: genuinely free (local open-weight models), platform-free (Groq's hosted free tier with rate limits), and paid API access starting at $0.15 per million input tokens for the open-weight 35B model up to $2.50 per million for the frontier-tier Qwen3.7-Max.

Local hosting cost: Apache 2.0 open-weight models

Qwen HuggingFace

$0.15

Per million input tokens: cheapest hosted API (Qwen3.6-35B-A3B)

Alibaba Cloud

90%

Discount on cached vs standard input tokens (Qwen3.7-Max)

Alibaba Cloud

Cheaper than Claude Opus 4.8 at frontier tier ($2.50 vs $15/M)

Anthropic Pricing

Which Qwen pricing tier applies depends on your use case. A developer running Qwen3.6-35B-A3B on an RTX 4090 for local coding assistance pays nothing. A team using Qwen3.7-Max for an enterprise agent pipeline pays $2.50 per million input tokens, still well below comparable frontier models. Both paths use the same underlying model architecture; the cost difference comes from who is running the inference.

Free Access: Local, Groq, and What Happened to OAuth

Local Self-Hosting

Every Qwen model under 35 billion parameters carries an Apache 2.0 license. That means commercial use, fine-tuning, redistribution, and integration into proprietary products, with no royalty or usage fee. The Qwen3.5-397B-A17B model is the notable exception: at 397 billion parameters, it is also Apache 2.0, an unusually permissive choice for a model of this capability.

Practical hardware benchmarks from the 2026 model lineup: Qwen3.6-35B-A3B runs at 20-25 tokens per second on an RTX 4090 with 24GB of VRAM (INT4 quantization). The model activates only 3 billion parameters per token, so inference cost tracks the 3B active count, not the 35B total. For 16GB VRAM, Qwen3.5-9B is the mainstream local choice for coding and RAG workflows.

Groq Free Tier

Groq hosts Qwen3-32B on its free tier. No credit card required. The model covers strong coding and math tasks and supports Qwen's switchable thinking/non-thinking modes. Groq does not publish specific request-per-minute limits for free tier accounts, so treat this path as suitable for prototyping and personal projects rather than production pipelines.

The Qwen OAuth Era Is Over

From Qwen's public launch through early 2026, Alibaba ran a free OAuth tier through the Qwen API. At its peak, it offered 1,000 requests per day. Before the shutdown, Alibaba cut the limit to 100 requests per day. The tier closed entirely on April 15, 2026.

The Qwen OAuth free tier was discontinued on April 15, 2026. Any integration built on that authentication path stopped working on that date. There is no announced replacement. If you need a free hosted option, use the Groq free tier or self-host an open-weight model.

Qwen Pricing: API Tiers From $0.15 to $2.50 Per Million Tokens

The paid Qwen API runs through Alibaba Cloud Model Studio (primary endpoint: Singapore region). Pricing follows a per-token model: one million tokens is roughly 750,000 words, or about 3,000 typical API requests at average message length. Three tiers cover most production use cases.

Model	Input ($/M tokens)	Output ($/M tokens)	Context	Open-Weight
Qwen3.7-Max	$2.50	$7.50	1M tokens	No (API only)
Qwen3.6-Max-Preview	$1.30	$7.80	262K tokens	No (API only)
Qwen3.6-35B-A3B	$0.15	$1.00	262K native (1M via YaRN)	Yes (Apache 2.0)

6x cheaper

Qwen3.7-Max ($2.50/M input) vs Claude Opus 4.8 ($15/M input). A workflow costing $300 in Opus 4.8 tokens costs approximately $50 in Qwen3.7-Max tokens.

Qwen3.7-Max is the flagship API model: approximately 1 trillion total parameters with roughly 24 billion active per forward pass (Mixture of Experts architecture), a 1 million token context window, and $2.50 per million input tokens. On SWE-Bench Pro (the autonomous software engineering benchmark), it scores 60.6%, which puts it in the top tier of current frontier models at one-sixth the per-token cost of comparable alternatives.

Qwen3.6-35B-A3B is the open-weight API model. At $0.15 per million input tokens, it delivers the cheapest tier in Qwen pricing for hosted inference; the model is the same one available for free local download, so the API version simply provides Alibaba's managed inference at scale. The model is multimodal, supporting image input alongside text. Cache reads drop to $0.05 per million tokens.

FREE TEMPLATE

AI Governance Charter

Establish your organization's AI principles in one document

Download Free →

Prompt Caching: The 90% Discount

Qwen3.7-Max prompt caching cuts your effective input cost by 90% on repeated context prefixes: $0.25/M on cache reads versus $2.50/M on fresh input. You pay once to write the cache at $3.125/M, then recoup that cost across subsequent calls that reuse the same prefix. The crossover breaks even after roughly 2-3 reuses, and the discount compounds as call volume grows.

Cache Action	Price per Million Tokens	Notes
Standard input (no cache)	$2.50	Qwen3.7-Max baseline
Cache creation	$3.125	25% above standard input (you pay to store)
Cache read	$0.25	90% below standard input (you save on retrieval)
Qwen3.6-35B-A3B cache read	$0.05	Separate pricing for the open-weight tier

Caching makes economic sense for: RAG pipelines where system prompts and retrieved context repeat across user turns; customer support agents with long instruction sets that stay constant across sessions; and document analysis workflows that process the same background material repeatedly.

The Qwen3.7-Max cache has a 5-minute time-to-live. If more than 5 minutes pass between requests sharing the same prefix, the cache expires and you pay the cache creation fee again on the next request. Interactive chatbots with frequent turns are well-suited. Batch jobs with long gaps between calls are not.

Third-Party Providers

DeepInfra and OpenRouter are the two most practical alternatives to Alibaba Cloud for Qwen API access. Both are OpenAI-compatible, so the same SDK code works with a base URL swap. Rates differ by model and access type.

Provider	Models Available	Input / Output (per M)	API Compatibility
DeepInfra	Qwen3.5-397B-A17B	$0.54 / $3.40	OpenAI-compatible
DeepInfra	Qwen3-235B-A22B (thinking)	$0.45 / $3.49	OpenAI-compatible
OpenRouter	Qwen3.7-Max, Qwen3.6-35B-A3B + others	Varies by model	OpenAI-compatible
Together AI	Qwen open-weight models	Varies (serverless / dedicated)	OpenAI-compatible

DeepInfra is the standout third-party option for large open-weight models. Its rate for Qwen3.5-397B-A17B ($0.54/$3.40 per million input/output tokens) undercuts Alibaba Cloud's listed rates for comparable capability tiers. Qwen3-235B-A22B with thinking enabled runs at $0.45/$3.49, useful for math and reasoning tasks where chain-of-thought is required and you want to avoid the Alibaba Cloud billing relationship.

OpenRouter provides a single OpenAI-compatible endpoint that routes to multiple backend providers. You can access qwen/qwen3.7-max or qwen/qwen3.6-35b-a3b without creating an Alibaba Cloud account. OpenRouter bills in credits, which some teams find simpler than managing regional cloud credentials.

Together AI hosts Qwen open-weight models. Rates vary by model and tier (serverless vs. dedicated instances). Dedicated instances carry a fixed hourly cost and make sense only when workload volume is high enough that per-token billing becomes expensive.

One practical note: third-party providers source their models from the open-weight releases. You will not find Qwen3.7-Max or Qwen3.6-Max-Preview on DeepInfra or Together AI: those are proprietary API-only models, available only through Alibaba Cloud. If you need the frontier-tier capability, Alibaba Cloud is the only option.

Enterprise Plans

Standard pay-per-token billing works for most teams. Two situations break that model: development teams that run Qwen agents continuously and need predictable costs, and organizations with data residency requirements that prohibit cloud API calls entirely. Alibaba Cloud has a separate track for each.

Alibaba Cloud Coding Plan

The Coding Plan is a fixed monthly subscription designed for development teams running Qwen continuously through coding agents, IDE integrations, or CI pipelines. Rather than accumulating per-token charges across many small requests, teams pay a predictable monthly fee that covers higher rate limits and priority routing. Alibaba has not published the subscription amount publicly; pricing is disclosed during the enterprise sales process and varies by seat count and commitment term.

The Coding Plan is worth evaluating if your team already uses Qwen Code (the terminal agent) or has integrated Qwen into an IDE extension like VS Code Continue or JetBrains. For intermittent use, pay-per-token billing remains cheaper. The crossover point depends on your average monthly token consumption.

Enterprise Deployment Kit

The Enterprise Deployment Kit provides Docker and Kubernetes deployment configurations for running Qwen models on private infrastructure. This targets industries with strict data residency requirements: finance, defense, healthcare, and high-security enterprise environments where cloud API calls are prohibited by policy or regulation.

With the kit, Qwen runs entirely within your network. No data leaves your perimeter, there are no per-token charges, and you control the hardware. The tradeoff is that you take on the operational cost of running the models yourself: GPU or accelerator provisioning, serving software, and ongoing maintenance. The kit includes configurations for vLLM and other production serving frameworks. It is distributed through Alibaba Cloud's enterprise sales channel, not as a public download.

The Alibaba Cloud Coding Plan subscription fee is not published on the pricing page. Expect a sales conversation before you see a number. If you need a budget estimate, request a quote through the Alibaba Cloud enterprise portal and ask for a consumption-based projection at your expected token volume.

Full Model Lineup (2026)

The Qwen3 model family spans five capability tiers from API-only frontier models to sub-1B edge models. Qwen pricing across these tiers ranges from $0 (open-weight, self-hosted) to $2.50/M input tokens for the frontier API. Understanding active vs. total parameter counts is critical here: Qwen's MoE (Mixture of Experts) models have large total parameter counts but activate only a fraction per request, which is why a "35B" model can run on a 24GB GPU.

Model	Params	Context	Access	License	Input / Output (per M)
Qwen3.7-Max	~1T	1M tokens	API only	Proprietary	$2.50 / $7.50
Qwen3.6-Max-Preview	~1T	262K tokens	API only	Proprietary	$1.30 / $7.80
Qwen3.6-35B-A3B	35B (3B active)	262K tokens	Open-weight + API	Apache 2.0	$0.15 / $1.00
Qwen3.5-397B-A17B	397B (17B active)	262K tokens	Open-weight	Apache 2.0	$0.54 / $3.40 (DeepInfra)
Qwen3-235B-A22B	235B (22B active)	262K tokens	Open-weight	Tongyi Qianwen	$0.45 / $3.49 (DeepInfra)
Qwen3-32B	32B dense	262K tokens	Open-weight	Apache 2.0	Free (Groq) / self-host
Qwen3.6-27B	27B dense	262K tokens	Open-weight	Apache 2.0	Self-host
Qwen3.5 small (0.8B–9B)	0.8B / 2B / 4B / 9B	262K tokens	Open-weight	Apache 2.0	Self-host (edge/consumer)

Three patterns to carry forward. The 1M token context window is exclusive to Qwen3.7-Max on the proprietary API; open-weight models top out at 262K tokens natively. The Tongyi Qianwen License applies to Qwen3-235B-A22B and some other larger models, so read the licensing section below before building a commercial product on any Tongyi-licensed model. The small series (0.8B to 9B) is genuinely competitive on edge hardware and benchmarks above similar-size models from other vendors.

Licensing: What You Can Build

Qwen's licensing varies by model tier, and the distinction has real consequences for commercial projects. There are three regimes you need to understand.

Apache 2.0: Full Commercial Freedom

Most Qwen3-generation models at 35B parameters and below ship under Apache 2.0. This includes Qwen3.6-35B-A3B, Qwen3-32B, Qwen3.6-27B, and the much larger Qwen3.5-397B-A17B, which Alibaba released under Apache 2.0 as an unusually permissive exception for a model of that capability. Apache 2.0 permits commercial use, fine-tuning, and redistribution without requiring you to publish your modifications or pay royalties. If you are building a product on an open-weight Qwen model, Apache 2.0 means you can ship it.

Tongyi Qianwen License: Restricted Commercial Use

Some larger models in the Qwen family (including Qwen3-235B-A22B) use the Tongyi Qianwen License. This license permits non-commercial use freely. Commercial use requires a separate agreement with Alibaba Cloud if the product reaches more than 100 million monthly active users. Below that threshold, commercial use is permitted without a separate agreement, but the license terms are more restrictive than Apache 2.0: you cannot relicense the model weights, and certain redistribution conditions apply. Read the full license text before building a commercial product on a Tongyi-licensed model.

Proprietary API: No Weight Access

Qwen3.7-Max and Qwen3.6-Max-Preview are API-only models. Alibaba does not release the weights. You interact through the API under Alibaba Cloud's standard terms of service. There is no licensing decision to make: you are renting compute, not owning a model. This is identical to how comparable proprietary frontier models operate.

The licensing pattern changed between Qwen2.5 and Qwen3.x. Do not assume all Qwen models share the same license. Check the Hugging Face model card for the specific model variant you are deploying. Qwen3-235B-A22B is Tongyi-licensed, not Apache 2.0, despite being an open-weight release.

Tongyi Qianwen License requires a separate Alibaba Cloud agreement for products exceeding 100 million monthly active users. This is not a concern at early product stage, but plan for it before you scale. The agreement process is not instantaneous.

Frequently Asked Questions

Is Qwen free to use?

Yes, in two ways. Open-weight Qwen models run locally at zero cost under Apache 2.0, no API key, no usage limits. Groq hosts Qwen3-32B on its free tier with no credit card required, though rate limits apply. The Alibaba Cloud API starts at $0.15 per million input tokens for Qwen3.6-35B-A3B. The former OAuth free tier was discontinued April 15, 2026.

How does Qwen pricing compare to ChatGPT and Claude?

Qwen3.7-Max costs $2.50 per million input tokens, roughly 6x cheaper than Claude Opus 4.8 at $15/M. A workflow costing $300 in Opus 4.8 tokens costs approximately $50 in Qwen3.7-Max tokens, with comparable or better performance on coding benchmarks.

Can I run Qwen on my own hardware for free?

Yes. Open-weight Qwen models under Apache 2.0 permit commercial use, fine-tuning, and redistribution. Qwen3.6-35B-A3B runs at 20-25 tokens per second on an RTX 4090 with 24GB VRAM. Qwen3.5-397B-A17B fits in 4-bit quantization on a Mac Studio with 256GB of RAM.

What happened to the Qwen free API tier?

Alibaba discontinued the Qwen OAuth free tier on April 15, 2026. The tier originally allowed 1,000 requests per day, later reduced to 100 requests per day before shutdown. The Groq free tier (Qwen3-32B) remains active and does not require a credit card.

Qwen 3 API Pricing & Setup Guide

Search YouTube for the latest tutorials on Qwen API pricing and Alibaba Cloud Model Studio setup

Run Qwen Locally for Free with Ollama

Search YouTube for step-by-step Ollama + Qwen local setup walkthroughs

Qwen vs Claude vs GPT: Cost Comparison

Search YouTube for cost benchmark comparisons between Qwen, Claude, and ChatGPT

Foundation

What Is Qwen AI?

Alibaba's open-weight model family explained: architecture, capabilities, and why it rivals frontier-tier models at a fraction of the cost.

Read Article →

Hands-On

How to Run Qwen Locally

From ollama pull qwen3:32b to production vLLM, the complete local deployment guide for self-hosting Qwen on your own hardware.

Read Article →

Cross-Vendor

DeepSeek Pricing Explained

The other low-cost Chinese frontier model: how DeepSeek's pricing compares to Qwen across API tiers, open-weight options, and enterprise plans.

Read Article →

Go Deeper

Resources from across Tech Jacks Solutions

FREEAI Governance Charter

Establish your organization's AI principles in one document

AI Career Paths

Explore roles that work with these tools daily

EU AI Act Guide

Check your compliance obligations under the EU AI Act

FREEAI Risk Management Template

Identify, assess, and mitigate AI deployment risks

FREEAI Bias Assessment

Evaluate bias risks before deploying any AI system

Facts verified against Alibaba Cloud Model Studio documentation (May 2026). Pricing reflects published rates as of the date shown above. Verify current pricing at dashscope.aliyun.com before making purchasing decisions.

Qwen and Alibaba Cloud are trademarks of Alibaba Group Holding Limited. Tech Jacks Solutions is an independent publisher and is not affiliated with, endorsed by, or sponsored by Alibaba Group. All product names, logos, and brands are property of their respective owners.

Gallery

Contacts

Is Qwen Free? Pricing, Models & API Tiers Explained

The Short Answer: Yes, Three Ways

Free Access: Local, Groq, and What Happened to OAuth

Local Self-Hosting

Groq Free Tier

The Qwen OAuth Era Is Over

Qwen Pricing: API Tiers From $0.15 to $2.50 Per Million Tokens

Prompt Caching: The 90% Discount

Third-Party Providers

Enterprise Plans

Alibaba Cloud Coding Plan

Enterprise Deployment Kit

Full Model Lineup (2026)

Licensing: What You Can Build

Apache 2.0: Full Commercial Freedom

Tongyi Qianwen License: Restricted Commercial Use

Proprietary API: No Weight Access

Frequently Asked Questions

Go Deeper

Services

Learn

Company