Alibaba Qwen

What Is Qwen AI? Alibaba's Open-Weight Model Explained

Q: Can I use Qwen commercially?

Yes, with conditions based on model size. Models at or under 35B parameters are Apache 2.0: commercial use, fine-tuning, and redistribution are all permitted. Models at 72B parameters and above use the Tongyi Qianwen License, which adds restrictions on commercial deployment. Check the license on the model's Hugging Face card before building a product on it.

When Alibaba published its first open-weight language model in August 2023, the AI community took note. When the Qwen AI family crossed 90,000 derivative models on ModelScope, it became impossible to ignore. Qwen AI is Alibaba Cloud's family of open-weight and proprietary language models, engineered to cover the full spectrum from edge inference on consumer hardware to frontier-tier agentic coding in the cloud. The flagship Qwen3.7-Max places in the global top 5 on the Artificial Analysis Intelligence Index. Most models under 35 billion parameters ship under Apache 2.0, which means you can fine-tune and redistribute them commercially without royalties.

What is Qwen AI in one sentence? Qwen is Alibaba Cloud's AI model family, ranging from free open-weight models you can run on a laptop to a frontier-grade API that outscores Claude Opus on coding benchmarks at a fraction of the cost.

Who Built Qwen? (Alibaba Cloud)

Qwen is an Alibaba Cloud product, built by the Tongyi Large Model Business Unit. The division's chief AI architect is Zhou Jingren, who oversees model research and deployment strategy. The name "Qwen" is the commercial brand; the internal designation is "Tongyi Qianwen," which translates roughly to "understanding all questions."

In March 2026, Alibaba consolidated its AI research under a new entity called Alibaba Token Hub, which brought together model training, inference infrastructure, and developer tooling under one organizational umbrella. Alibaba CEO Eddie Wu has been the executive sponsor of the AI push since the company restructured its operations.

The Qwen project started quietly. A beta version shipped in April 2023, limited to enterprise partners on the Alibaba Cloud platform. By August 2023, Alibaba released Qwen-7B as an open-weight model. It was one of the first billion-plus-parameter models from a Chinese lab to ship downloadable weights under a permissive license, and that decision set the tone for everything that followed.

Understanding the naming convention helps: the "Qwen" prefix is the brand, the number is the generation (Qwen3.7 is the seventh sub-release of the third generation), and for MoE models the suffix format is [total params]B-A[active params]B. So Qwen3.6-35B-A3B has 35 billion total parameters but only 3 billion active per token. Who makes Qwen models at this scale is a team effort: Alibaba reports hundreds of researchers across its Singapore and mainland China research hubs.

Who owns Qwen AI ultimately is Alibaba Group, the Chinese technology company listed on the New York and Hong Kong stock exchanges. The AI model division operates as part of Alibaba Cloud, the group's cloud computing arm.

The Qwen AI Model Family Explained

The Qwen model family spans eight generations as of May 2026, ranging from 0.5-billion-parameter edge models to the approximately 1 trillion parameter Qwen3.7-Max. Each generation introduced architectural advances that separated Qwen from a derivative of Western model research into a distinct technical approach.

The current flagship is Qwen3.7-Max, a proprietary model available through the Alibaba Cloud API. It uses a Hybrid Gated DeltaNet architecture, a design that mixes linear attention layers and standard full attention in a 3:1 ratio. This is not standard Transformer architecture. Linear attention scales more efficiently for long sequences, which is part of why Qwen3.7-Max handles a 1 million token context window (roughly 750,000 words, or a large codebase) without the prohibitive memory costs that affect dense transformer models at the same scale.

Open-weight models currently top out at Qwen3.5-397B-A17B, which has 397 billion total parameters but only 17 billion active per token (this is Mixture of Experts: the model activates a different subset of parameters for each token, keeping runtime cost proportional to the 17B active count, not the 397B total). Independent evaluations place it in the GPT-5.2 tier. It runs in 4-bit quantization on a Mac Studio with 256GB of RAM.

For developers who need something smaller, Qwen3.6-35B-A3B is the current local-deployment sweet spot. At 35 billion total / 3 billion active parameters, it reaches 20-25 tokens per second on a single RTX 4090. It ships under Apache 2.0 and handles text, images, and video natively.

~1T

Parameters
Qwen3.7-Max

Qwen Blog

Token Context
Window

Alibaba Cloud Docs

90K+

Derivative Models
on ModelScope

ModelScope Hub

201

Languages
Supported

Qwen Blog

The Qwen3 generation introduced the Hybrid Gated DeltaNet attention mechanism across the entire family. Qwen models use a 3:1 ratio of linear attention layers to full attention layers. Linear attention computes in linear time relative to sequence length (versus quadratic time for standard attention), which enables the long context windows that make Qwen models practical for large codebases and document analysis. The tradeoff is that linear attention can lose recall on certain retrieval tasks, which is why the full attention layers are retained at the 1-in-4 position.

How Qwen AI Benchmarks Against GPT and Claude

Qwen3.7-Max leads two public benchmarks as of May 2026. These are not self-reported scores. They come from third-party leaderboards that accept model submissions and verify results against held-out test sets.

On SWE-Bench Pro (the harder version of the standard SWE-Bench that evaluates models against 2,294 real GitHub issues from production open-source projects), Qwen3-235B-A22B scores 60.6%. A score of 60.6% means the model autonomously resolves roughly 6 in 10 real bug reports without human help. Claude Opus 4.8 scores 57.3% on the same benchmark. That 3.3-point gap on a benchmark designed to simulate real software engineering work is meaningful.

On Terminal-Bench 2.0, which simulates a software engineer working in a sandboxed terminal environment with a 5-hour timeout and access to shell commands, file I/O, and internet access, Qwen3-235B-A22B scores 69.7%, the top result on the public leaderboard at time of writing.

On math reasoning, Qwen3-235B-A22B scores 90.2% on MATH-500 and holds a CodeForces ELO of 2,056. On the Artificial Analysis Intelligence Index v4.0, Qwen3.7-Max scores 56.6, placing it in the global top 5. See our Qwen vs DeepSeek comparison for a full head-to-head on benchmarks and pricing.

Coding: 2,294 real GitHub issues · Source: swebench.com

Qwen3-235B-A22B

60.6%

Claude Opus 4.8

57.3%

DeepSeek-R1

~49%

Qwen leads on production coding tasks. The 3.3-point gap over Claude Opus 4.8 is statistically meaningful on a 2,294-issue benchmark.

Autonomous terminal agent · 5-hour sandboxed sessions · Source: terminal-bench.com

Qwen3-235B-A22B

69.7%

Kimi K2.6

66.7%

Claude Opus 4.8

65.4%

Qwen3-235B-A22B holds the #1 public leaderboard position on Terminal-Bench 2.0 as of May 2026.

Scores as of May 2026. Source: swebench.com · Artificial Analysis. Rankings may shift as new models submit results.

FREE TEMPLATE

AI Risk Management Template

Identify, assess, and mitigate AI deployment risks

Download Free →

Open Source or Proprietary? Qwen Licensing Explained

Qwen's licensing is tiered by model size. Models at or below 35 billion parameters ship under Apache 2.0, the most permissive open-source license available. That means you can download the weights, fine-tune on proprietary data, build products, and distribute commercially without paying Alibaba anything or negotiating a separate agreement.

Models at 72 billion parameters and above use the Tongyi Qianwen License, which places additional restrictions on commercial deployment and redistribution. If you are building a SaaS product on a 72B+ model, or distributing a fine-tune to downstream customers, the license text governs. Alibaba publishes the full terms on each model's Hugging Face card.

Apache 2.0

for all Qwen models at or under 35B parameters, including the high-capability Qwen3.6-35B-A3B and the open-weight flagship Qwen3.5-397B-A17B

The practical split: Qwen3.6-35B-A3B (35B total, 3B active MoE) and Qwen3.5-397B-A17B are both Apache 2.0, unusually permissive for models operating at near-frontier capability. The API-only Qwen3.7-Max is a proprietary hosted model; its weights are not available for download.

One nuance: "open-weight" is not the same as "open source." Qwen AI releases the weights, but training pipelines, full architecture specifications, and training data are not always disclosed. Apache 2.0 gives you freedom over the weights themselves; it does not provide a reproducible training pipeline.

The Qwen Ecosystem

One indicator that a model family has reached mainstream adoption: the derivative model count on ModelScope. Qwen has crossed 90,000, including fine-tunes, quantizations, domain-adapted variants, and multilingual models built by the community on top of the base weights. That puts it alongside Llama in open-weight ecosystem depth.

Language support is another dimension where Qwen separates from most Western-developed models. Official support extends to 201 languages, reflecting Alibaba's emphasis on markets where English-first models leave significant gaps. Chinese, Arabic, Korean, and Southeast Asian languages receive dedicated tuning alongside major European languages.

Qwen Generation Timeline

April 2023

Qwen Beta

Internal beta launch. First public glimpse of Alibaba's large model capabilities under the Tongyi Qianwen brand.

August 2023

Qwen-7B Open Weight

First public open-weight release under Apache 2.0. Triggered the derivative model ecosystem that now exceeds 90,000 variants.

June 2024

Qwen2

Major architectural revision. Expanded language coverage, improved instruction following. Entered global top-tier benchmarks for the first time.

September 2024

Qwen2.5

Coding, math, and long-context improvements. Qwen2.5-Coder emerged as a leading open-weight coding model in its class.

May 2025

Qwen3

Hybrid Gated DeltaNet architecture introduced. Switchable thinking/non-thinking modes. Qwen3-235B-A22B claimed global top-5 placement across multiple benchmarks.

May 2026

Qwen3.7-Max Stable

Approximately 1 trillion parameters. 1M token context window. $2.50/M input tokens via API. MCP native. Anthropic API protocol compatible.

On the protocol side, Qwen supports MCP (Model Context Protocol) natively: tool-calling integrations built for Claude or other MCP-compatible systems work without modification. Qwen also implements the Anthropic API protocol, so swapping Claude for Qwen3.7-Max in a compatible coding agent requires changing three environment variables and nothing else.

Qwen Code: The Terminal Coding Agent

Qwen Code is Alibaba's terminal-native coding agent, positioned as a direct alternative to Claude Code. It operates as a CLI tool, accepts natural language task descriptions, and runs multi-step coding sessions autonomously, reading files, writing tests, running shell commands, and iterating on failures without human input between steps.

35 hours

autonomous coding session with 1,158 tool calls executed without human intervention (vendor-reported record as of May 2026)

The benchmarks reflect the capability. On Terminal-Bench 2.0, Qwen3-235B-A22B scores 69.7%, the top result globally as of this writing. On SWE-Bench Pro (which evaluates agents against real GitHub issues), the same model reaches 60.6%, compared to Claude Opus 4.8 at 57.3%.

Because Qwen implements the Anthropic API protocol natively, MCP server configurations and tool-calling patterns that work in Claude Code work with Qwen Code. Developers running Claude-based pipelines can evaluate Qwen as a drop-in at roughly one-sixth the API cost of Claude Opus.

Who Should Use Qwen?

Qwen's range, from edge-deployable 0.8B models to a frontier trillion-parameter API, means different users get genuinely different things from it. The right entry point depends on your infrastructure, budget, and latency requirements.

Enterprise Agent Builders

You need top-tier SWE-Bench performance and are currently paying frontier-model prices. Qwen3.7-Max at $2.50/M input tokens delivers Claude Opus-class benchmark results at roughly one-sixth the cost. Workflows that cost $300 in Claude Opus tokens cost approximately $50 in Qwen3.7-Max tokens.

Best fit: Qwen3.7-Max API

Self-Hosting Teams

You want frontier-class capability without routing data to a cloud API. Qwen3.5-397B-A17B (Apache 2.0) runs in 4-bit quantization on a 256GB Mac Studio or multi-GPU server. No per-token cost after hardware. Evaluated at GPT-5.2 class on independent benchmarks.

Best fit: Qwen3.5-397B-A17B (local)

Budget-Conscious Developers

You want capable API access at the lowest possible per-token cost. Qwen3.6-35B-A3B at $0.15/M input is among the cheapest capable models available from any provider, with multimodal support and a 262K native context window. Apache 2.0 for all commercial uses.

Best fit: Qwen3.6-35B-A3B API

Edge & Mobile Builders

You are targeting hardware with 1-8GB of RAM: smartphones, embedded devices, or IoT endpoints. The Qwen3.5 small series (0.8B through 4B) runs in 4-bit quantization at 1-4.5GB and delivers capable code completion and summarization at these hardware constraints.

Best fit: Qwen3.5 0.8B to 4B series

Limitations to Know Before You Commit

Qwen's pricing and open-weight availability make a strong case. Three structural limitations apply regardless of which model or tier you choose.

The primary Alibaba Cloud Model Studio endpoint routes through Singapore. Organizations with EU data residency requirements (GDPR) or US government data handling rules should evaluate this before choosing the hosted API. The Enterprise Deployment Kit (Docker and Kubernetes configurations) is the on-premise alternative for high-compliance environments.

The Tongyi Qianwen License applies to models at 72B parameters and above. It is more restrictive than Apache 2.0 on commercial redistribution and may limit certain derivative applications. Read the full license text on Hugging Face before building a product on any 72B+ model. Models at or under 35B are Apache 2.0 without restriction.

The 35-hour autonomous coding session record is vendor-reported and has not been independently replicated. SWE-Bench Pro scores are submitted to and verified by the leaderboard maintainers, but real-world performance in your specific domain may differ from benchmark conditions. Treat all performance figures as directional, not guaranteed.

Frequently Asked Questions

Is Qwen free to use?

It depends on how you access it. Open-weight models (Qwen3.6-35B-A3B, Qwen3.5-397B-A17B, and others) can be downloaded and run locally for free with no usage limits under Apache 2.0. The hosted API at Alibaba Cloud charges per token: Qwen3.7-Max starts at $2.50 per million input tokens. Qwen3-32B is also available on Groq's free tier, rate-limited, with no credit card required.

What is the difference between Qwen and ChatGPT?

Qwen is built by Alibaba Cloud; ChatGPT is built by OpenAI. The most significant practical difference is that Qwen's largest models are available as open-weight downloads under Apache 2.0, while ChatGPT's models are API-only with no downloadable weights. Qwen's API pricing is substantially lower than comparable frontier models: Qwen3.7-Max at $2.50/M input tokens. Both support tool calling and multi-turn conversations, but Qwen also provides native MCP support and the Anthropic API protocol.

Can I use Qwen commercially?

Yes, with conditions that depend on model size. Models at or under 35B parameters are Apache 2.0: commercial use, fine-tuning, and redistribution are all permitted. Models at 72B parameters and above use the Tongyi Qianwen License, which places additional restrictions on commercial deployment and redistribution. Read the license on the specific model's Hugging Face card before building a product on it.

What is Hybrid Gated DeltaNet?

Hybrid Gated DeltaNet is the attention architecture powering Qwen3.7-Max. It combines linear attention (efficient for long sequences) and full attention (accurate for local relationships) in a 3:1 ratio: three linear layers for every one full attention layer. This allows the model to process 1 million token context windows without the quadratic memory cost that standard transformer attention would require at that scale.

Video Resources

▶

Qwen3 Complete Overview and Benchmark Deep Dive

YouTube

▶

Running Qwen Locally with Ollama: Quickstart Guide

YouTube

▶

Qwen vs Claude vs GPT: Side-by-Side Coding Test

YouTube

Keep Learning

Breakdown

Is Qwen Free? Pricing, Models & API Tiers Explained

A full breakdown of every pricing tier, free access options, and the enterprise deployment kit.

Coming Soon

How to Run Qwen Locally: Complete Guide

From ollama pull to a production vLLM server: the full local deployment path for every hardware tier.

Comparison

Qwen vs DeepSeek: Chinese AI Head-to-Head

Same MoE DNA, different strategies. A benchmark-by-benchmark comparison of the two leading Chinese AI labs.

Go Deeper

Resources from across Tech Jacks Solutions

FREEAI Risk Management Template

Identify, assess, and mitigate AI deployment risks

EU AI Act Guide

Check your compliance obligations under the EU AI Act

FREEAI Bias Assessment

Evaluate bias risks before deploying any AI system

What Is Agentic AI?

Understand the architecture behind autonomous AI agents

AI Career Paths

Explore roles that work with these tools daily

Grounded against Alibaba Cloud API documentation, ModelScope, Hugging Face model cards, and SWE-Bench Pro leaderboard. Pricing and specifications verified May 2026.

Qwen, Tongyi Qianwen, Qwen Code, and Alibaba Cloud are trademarks of Alibaba Group Holding Limited. All product names, logos, and brand identifiers are the property of their respective owners. Tech Jacks Solutions has no commercial relationship with Alibaba Cloud. This article is editorially independent.

Gallery

Contacts

What Is Qwen AI? Alibaba's Open-Weight Model Explained

Who Built Qwen? (Alibaba Cloud)

The Qwen AI Model Family Explained

How Qwen AI Benchmarks Against GPT and Claude

Open Source or Proprietary? Qwen Licensing Explained

The Qwen Ecosystem

Qwen Code: The Terminal Coding Agent

Who Should Use Qwen?

Limitations to Know Before You Commit

Frequently Asked Questions

Video Resources

Go Deeper

Services

Learn

Company