Download Gemma 4 Models

Gemma 4 model weights are available for free from multiple official sources. Whether you need full-precision weights for research, quantized GGUF files for local inference, or pre-packaged models for Ollama, this guide covers every download option.

All Gemma 4 models are released under the Apache 2.0 license, which means you can download, use, modify, and redistribute them freely for any purpose — including commercial applications.

Official Download Sources

Hugging Face

The primary platform for Gemma 4 model weights. Offers all variants in multiple formats including SafeTensors, GGUF, and GPTQ quantized versions. Supports git-based downloads, the Hugging Face CLI, and direct browser downloads.

• All model variants and sizes
• Multiple quantization formats
• Git LFS and CLI downloads
• Community-contributed quantizations
• Model cards with documentation

Kaggle

Google's data science platform hosts official Gemma 4 model weights. Convenient for users already in the Kaggle ecosystem, with notebook integration for quick experimentation.

• Official Google distribution
• Notebook integration
• Version tracking
• Direct download
• downloadGuide.sources.kaggle.features.4

Ollama Library

Pre-packaged Gemma 4 models optimized for local inference with Ollama. One-command download and run. Models are automatically quantized and optimized for your hardware.

• One-command install
• Auto-optimized for your hardware
• All variants available
• Automatic updates
• downloadGuide.sources.ollama.features.4

ModelScope (魔搭社区)

China-based model hosting platform with fast download speeds for users in Asia. Mirrors the official Gemma 4 models with full documentation in Chinese.

• Fast downloads in China/Asia
• Chinese documentation
• Git-based downloads
• Community models
• downloadGuide.sources.modelscope.features.4

Model Format Guide

Understanding the different model file formats available for Gemma 4:

SafeTensors (.safetensors)

The default format on Hugging Face. Safe, fast-loading tensors designed to prevent code execution vulnerabilities. Used with Hugging Face Transformers, vLLM, and other Python-based frameworks.

Research, fine-tuning, Python frameworks, vLLM serving

GGUF (.gguf)

The standard format for llama.cpp and Ollama. Supports various quantization levels (Q4, Q5, Q8, etc.) to reduce model size and memory requirements. Optimized for CPU and mixed CPU/GPU inference.

Local inference, Ollama, llama.cpp, KoboldCpp, LM Studio

GPTQ

GPU-optimized quantization format that maintains high accuracy while significantly reducing VRAM requirements. Available through community contributions on Hugging Face.

GPU inference with reduced VRAM, production serving

MLX Format

Apple's native ML format optimized for Apple Silicon (M1/M2/M3/M4). Leverages unified memory architecture for efficient inference on Mac hardware.

Mac with Apple Silicon, MLX framework

Quantization Guide

Quantization reduces model size and memory usage at the cost of some accuracy. Here's how different levels compare for Gemma 4:

Format	Bits	Quality	Notes
BF16 / FP16 (Full Precision)	16-bit	100%	Full model quality with no accuracy loss. Requires the most VRAM and disk space.
INT8 / Q8	8-bit	~98-99%	Minimal quality loss. Halves VRAM requirements compared to FP16. Recommended for most GPU deployments.
Q5_K_M	5-bit	~95-97%	Good balance of quality and size. Popular choice for local inference with GGUF format.
INT4 / Q4_K_M	4-bit	~93-95%	Significant size reduction with acceptable quality for most use cases. Enables running larger models on consumer hardware.

Download via Command Line

Hugging Face CLI

Install the Hugging Face CLI and download models directly:

pip install huggingface_hub

# Download a specific model
huggingface-cli download google/gemma-4-31b

# Download GGUF quantized version
huggingface-cli download google/gemma-4-31b-GGUF \
  --include "gemma-4-31b-Q4_K_M.gguf"

Git LFS

Clone model repositories with Git Large File Storage:

git lfs install
git clone https://huggingface.co/google/gemma-4-31b

Ollama CLI

Pull models directly into Ollama:

# Pull any variant
ollama pull gemma4:e4b
ollama pull gemma4:31b
ollama pull gemma4:26b

Download FAQ

Where is the best place to download Gemma 4?

Hugging Face is the most comprehensive source with all formats and variants. For one-command local setup, use Ollama. For users in China, ModelScope offers faster download speeds.

What format should I download?

For Ollama or llama.cpp: download GGUF files. For Python/vLLM: use SafeTensors format. For Mac with Apple Silicon: use MLX format. If unsure, start with Ollama which handles format selection automatically.

How large are Gemma 4 model files?

Full precision sizes: E2B (~4GB), E4B (~8GB), 26B MoE (~52GB), 31B Dense (~62GB). Q4 quantized versions are roughly 4x smaller. Ollama's default downloads use optimized quantization.

Do I need a Hugging Face account to download?

No. Gemma 4 models are publicly accessible under Apache 2.0 license. You can download without an account, though having one enables faster downloads and access to the Hugging Face CLI.

What is a GGUF file?

GGUF (GPT-Generated Unified Format) is a binary format designed for efficient local inference with llama.cpp and Ollama. It supports various quantization levels, allowing you to trade accuracy for smaller file sizes and lower memory usage.

Can I download Gemma 4 in China?

Yes. ModelScope (魔搭社区) mirrors Gemma 4 models with fast download speeds within China. Alternatively, use a mirror or proxy for Hugging Face downloads.

downloadGuide.faq.items.6.q

downloadGuide.faq.items.6.a

downloadGuide.faq.items.7.q

downloadGuide.faq.items.7.a

downloadGuide.faq.items.8.q

downloadGuide.faq.items.8.a

downloadGuide.faq.items.9.q

downloadGuide.faq.items.9.a

Download and Deploy

Get Gemma 4 model weights and start deploying. Check our deployment guide for step-by-step setup instructions.

Deployment Guide Compare Models Try Online First