Content

1 Key Takeaways
2 Model Overview
3 Implementation
4 Step 1: Set up a GPU Virtual Machine
5 Step 2: Web Console
6 Step 3: Install Dependendencies
7 Step 4: Run the Model
8 Qwen Code: Open-Source CLI
9 Qwen3 From Scratch
10 Final Thoughts

Vijona

4 Feb at 10:56

Qwen3-Coder: An Agentic MoE Coding Model With 405B Parameters

There have been a wave of Qwen launches lately. One of the most notable is Qwen3-Coder, an agentic Mixture of Experts (MoE) model featuring 405B total parameters and 35B active parameters, built for high-end coding help and multi-turn tool usage. The short gap (less than two weeks) between Kimi-K2 and the arrival of Qwen3-Coder highlights just how aggressively teams are delivering specialized open-weight, agentic coding models directly to developers. What helps this model stand out is its smaller overall size (compared to Kimi K2’s 1 trillion parameters) alongside strong benchmark results.

Qwen3 launched in May of this year, and in the closing section of its technical report, they state: “we will work on improving model architecture and training methods for the purposes of effective compression, scaling to extremely long contexts, etc. In addition, we plan to increase computational resources for reinforcement learning, with a particular emphasis on agent-based RL systems that learn from environmental feedback.”

In July, the refreshed Qwen3 model introduced updated pretraining and reinforcement learning (RL) stages using a revised form of Group Relative Policy Optimization (GRPO) called Group Sequence Policy Optimization (GSPO), along with a scalable setup capable of running 20 000 independent environments simultaneously. We’re very excited (for the release of an updated technical report?) to learn more about the specifics.

Key Takeaways

405B parameter Mixture of Experts model with 35B active parameters
160 experts with 8 active per token
256K token context length extendable to 1M with YaRN
High SWE-bench verified score on long horizon tasks (69.6 with 500 turns vs Claude-Sonnet-4 at 70.4% with 500 turns)
Trained with Group Sequence Policy Optimization
Smaller 30B A3B Instruct variant runs on a single H100 GPU
Qwen Code CLI open-sourced as a fork of Gemini CLI

Here’s a high level overview to get you up to speed with Qwen3-Coder’s internals.

Model Overview

Spec	Relevance
Mixture of Experts (MoE)	The Mixture of Experts (MoE) design enables higher model scale and quality while cutting compute requirements. It relies on sparse Feedforward Neural Network (FFN) layers called experts, plus a gating mechanism that routes tokens to the top-k experts, meaning only part of the model’s parameters are used per token.
405B parameters, 35B active parameters	Because Qwen3-Coder uses MoE, it has both total and active parameter counts. “Total parameters” refers to the full sum of parameters across the entire model, including every expert, the router or gating network, and shared components—regardless of which experts are actually used during inference. This differs from “active parameters,” which describes the subset engaged for a given input, typically the chosen experts plus shared components.
Number of Experts =160, Number of Activated Experts = 8	This is very interesting because (click link).
Context length = 256K tokens natively, 1M with YaRN	YaRN (Yet another RoPE extensioN method), is a compute-efficient technique for extending the context window of transformer-based language models. In Qwen3-Coder, it pushes the context length up to one million.
GSPO (Group Sequence Policy Optimization)	In Qwen’s recent paper, they present GSPO with results suggesting better training efficiency and performance than GRPO (Group Relative Policy Optimization). GSPO stabilizes MoE RL training and may make RL infrastructure design simpler.

On benchmarks, Qwen3-Coder’s performance is impressive with its score of 67.0% on SWE bench verified – which increases to 69.6% with 500 turns. The 500-turn result simulates a more realistic coding workflow – where the model can read feedback (like test failures), modify code, rerun tests, and repeat until the solution works.swe-bench

Implementation

This article will include implementation details for a smaller variant, Qwen3-Coder-30B-A3B-Instruct. For those curious about the name of this variant, there are 30 billion total parameters and 3 billion active parameters. The instruct indicates it’s an instruction-tuned variant of the base model.

Implementation Specs

Number of Parameters: 30.5B total, 3.3B activated
Number of Layers: 48
Number of Attention Heads (GQA): 32for Q, and 4 for KV
Number of Experts and Activated Experts: 128 experts, 8 activated experts
Context Length: 262,144 native context (without YaRN)

So as we can see this particular model has slightly different specs, but can run on a single H100 GPU.

Step 1: Set up a GPU Virtual Machine

Step 2: Web Console

After your GPU Virtual Machine is created, you can open the Web Console.

Step 3: Install Dependendencies

Copy Code


apt install python3-pip
pip3 install transformers>=4.51.0

Step 4: Run the Model

Copy Code


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)

Qwen Code: Open-Source CLI

Qwen Code is an open-source command-line interface that enables developers to work with the Qwen3-Coder model on agentic coding tasks. It is a fork of the Gemini CLI, adapted to integrate smoothly with Qwen3’s capabilities.

We’ve included the steps to install the CLI, set it up, and run it with the Qwen3-Coder model.

Step 1: Install Node.js (Version 20 or Later)

Before you begin, make sure you have Node.js20+ installed on your device. In your terminal:

Copy Code

node -v

Step 2: Install Qwen Code CLI

Once Node.js is ready, install Qwen Code globally:

Copy Code

npm install -g qwen-code

This makes the qwen-code command available from anywhere on your system.

Step 3: Get an API Key

Get an API key from openAI

Copy Code

export OPENAI_API_KEY="your_api_key_here" export OPENAI_BASE_URL="https://dashscope-intl.aliyuncs.com/compatible-mode/v1" export OPENAI_MODEL="qwen3-coder-plus"

Step 4: Vibe Code

qwencliType qwen in your terminal and you’ll be able to vibe code.

For alternate ways to use Qwen3-Coder, check out the Qwen Coder blog post.

Qwen3 From Scratch

Here’s a notebook that may be of interest to those who want to improve their intuition around Qwen3’s underlying architecture.

Implement Qwen3 Mixture-of-Experts From Scratch by Sebastian Raschka: “this notebook runs Qwen3-Coder-30B-A3B-Instruct (aka Qwen3 Coder Flash) and requires 80 GB of VRAM (e.g., a single A100 or H100).”

Final Thoughts

We’re very excited to see the community experiment with these open-weight agentic coding models such as Qwen3-Coder, Kimi K2, Devstral, and integrate them in their workflows. What we’re most impressed about with Qwen3-Coder is its context window. At 246K tokens, extendable to a million, we’re excited to see how effective this model is in real-word software engineering use cases in comparison to alternative open-weight models. With its impressive context window, availability of accessible smaller variants with Qwen3-Coder-30B-A3B-Instruct, and the introduction of the Qwen Code CLI, this model is poised to empower developers with powerful, agentic coding assistance.

Source: digitalocean.com

Create a Free Account

Try now

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Kimi K2 Post-Training: Tool-Use, Synthetic Data, Reinforcement Learning

AI/ML, Tutorial

2 months ago

Vijona4 Feb at 11:10 Kimi K2 Post-Training: Tool Use, Data Synthesis, and Reinforcement Learning In an earlier piece, we covered Kimi K2, including its MoE design, the MuonClip optimizer, and…

PowerShell and Linux: Run Bash Commands with pwsh and WSL

Linux Basics, Tutorial

2 months ago

Vijona4 Feb at 11:03 Using Linux Commands in PowerShell: Cross-Platform Workflows with pwsh and WSL PowerShell and Linux are now far more connected than they used to be. Thanks to…

Episodic Memory in AI Agents: Long-Term Context & Learning

AI/ML, Tutorial

2 months ago

Vijona4 Feb at 10:52 Episodic Memory for AI: Enabling Context-Aware, Continuously Learning Agents Artificial intelligence has progressed from rigid, rule-driven automation toward adaptable and versatile systems that can learn, reason,…

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

Qwen3-Coder: An Agentic MoE Coding Model With 405B Parameters

Key Takeaways

Model Overview

Implementation

Implementation Specs

Step 1: Set up a GPU Virtual Machine

Step 2: Web Console

Step 3: Install Dependendencies

Step 4: Run the Model

Qwen Code: Open-Source CLI

Step 1: Install Node.js (Version 20 or Later)

Step 2: Install Qwen Code CLI

Step 3: Get an API Key

Step 4: Vibe Code