Download Gemma 4 E2B GGUF

The Gemma 4 E2B-it (5B params) is the smallest, fastest Gemma 4 variant — ideal for laptops, 8 GB GPUs and edge devices. This page lists every GGUF quantization with its exact filename, file size and approximate VRAM, plus copy-paste download commands.

GGUF is the format used by Ollama, llama.cpp, LM Studio and KoboldCpp. All files are published under the Apache 2.0 license — free for personal and commercial use, no Hugging Face account required.

Gemma 4 E2B GGUF File Sizes & Filenames

Exact filenames and sizes from unsloth's official GGUF repo — copy the filename you need straight into your download command.

QuantFilenameSizeMin VRAM
Q4_K_Mgemma-4-E2B-it-Q4_K_M.gguf3.11 GB~4 GB
Q5_K_Mgemma-4-E2B-it-Q5_K_M.gguf3.36 GB~5 GB
Q8_0gemma-4-E2B-it-Q8_0.gguf5.05 GB~6 GB
BF16gemma-4-E2B-it-BF16.gguf9.31 GB~11 GB

Sizes verified from unsloth's Hugging Face repo. Click to open the full file list: unsloth/gemma-4-E2B-it-GGUF

How to Download Gemma 4 E2B GGUF

Hugging Face CLI

Download a single quant file directly (recommended — avoids pulling the whole repo):

pip install huggingface_hub

# Recommended Q4_K_M build (3.11 GB)
huggingface-cli download unsloth/gemma-4-E2B-it-GGUF \
  --include "gemma-4-E2B-it-Q4_K_M.gguf"

Ollama

One command to pull and run — Ollama auto-selects an optimized quant:

ollama pull gemma4:e2b
ollama run gemma4:e2b

llama.cpp

Run straight from the Hugging Face repo with llama.cpp's built-in downloader:

./llama-cli -hf unsloth/gemma-4-E2B-it-GGUF:Q4_K_M

FAQ

Which Gemma 4 E2B GGUF quant should I download?

Q4_K_M (3.11 GB) is the best default — it keeps ~93–95% of full quality while staying small enough for 8 GB GPUs and most laptops. Choose Q5_K_M or Q8_0 if you have spare VRAM and want higher fidelity, or BF16 for full precision.

What is the file size of Gemma 4 E2B GGUF?

The Q4_K_M build is 3.11 GB. Larger quants range up the table to BF16. See the file-size table above for every quant.

How do I run Gemma 4 E2B with Ollama?

Run `ollama pull gemma4:e2b` then `ollama run gemma4:e2b`. Ollama downloads an optimized GGUF automatically and sets up the chat template for you.

Can the Gemma 4 E2B GGUF run on CPU only?

Yes. GGUF with llama.cpp or Ollama runs on CPU, GPU, or a hybrid split. CPU-only works but is slower; offloading layers to even a modest GPU speeds it up significantly.

Do I need a Hugging Face account or license acceptance?

No. Gemma 4 is Apache 2.0 and the GGUF files are publicly downloadable without an account or gated access.

Looking for other Gemma 4 sizes?

Compare every variant (E2B, E4B, 26B MoE, 31B Dense) in all formats, or check hardware requirements before you download.

Gemma 4 E2B GGUF Download — Q4_K_M, Q5, Q8 & BF16 Files | Gemma 4