Guide de déploiement

Exécutez Gemma 4 localement sur votre propre matériel. Plusieurs options de déploiement, des installateurs en un clic aux frameworks de service de niveau production.

By Ethan Lin·Founder & Open-Source LLM Engineer·Updated 2026-07-08

Ollama

Le moyen le plus simple d'exécuter Gemma 4 localement. Une commande pour télécharger et servir n'importe quelle variante avec optimisation matérielle automatique.

Installer Ollama

curl -fsSL https://ollama.com/install.sh | sh

Exécuter le modèle

# Gemma 4 31B (Dense) - 最强性能
ollama run gemma4:31b

# Gemma 4 26B (MoE) - 效率优先
ollama run gemma4:26b

# Gemma 4 E4B - 移动/轻量
ollama run gemma4:e4b

# Gemma 4 E2B - 边缘设备
ollama run gemma4:e2b

LM Studio

Application de bureau avec une interface visuelle pour télécharger, configurer et discuter avec les modèles Gemma 4. Idéal pour les débutants.

Download LM Studio from lmstudio.ai
Search for "Gemma 4" in the model browser
Select a quantized version matching your VRAM
Click Download and wait for completion
Start chatting in the built-in interface

vLLM

Moteur de service de production à haut débit avec PagedAttention, batching continu et endpoints API compatibles OpenAI.

pip install vllm
vllm serve google/gemma-4-31b --max-model-len 32768

llama.cpp

Moteur d'inférence C++ optimisé prenant en charge les modèles quantifiés GGUF. Exécutez Gemma 4 sur CPU ou en configurations CPU/GPU mixtes.

# Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build

# Run with GGUF model
./build/bin/llama-cli -m gemma-4-31b-Q4_K_M.gguf -p "Hello"

MLX

Framework natif Apple Silicon par Apple. Optimisé pour les puces série M avec mémoire unifiée, offrant d'excellentes performances sur le matériel Mac.

pip install mlx-lm
mlx_lm.generate --model google/gemma-4-31b --prompt "Hello"

Besoins en VRAM

Utilisation estimée de VRAM pour chaque variante de modèle à différents niveaux de quantification.

Model	BF16	INT8	INT4
E2B	4 GB	2.5 GB	1.5 GB
E4B	8 GB	5 GB	3 GB
26B MoE	52 GB	28 GB	16 GB
31B Dense	62 GB	33 GB	18 GB

Télécharger les modèles

Obtenez les poids des modèles Gemma 4 à partir de sources officielles.

Guide de déploiement

Ollama

Installer Ollama

Exécuter le modèle

LM Studio

vLLM

llama.cpp

MLX

Besoins en VRAM

Télécharger les modèles

Hugging Face

Kaggle

Ollama

ModelScope