The Gemma 4 E4B-it (8B params) is the balanced mid-size Gemma 4 variant — a strong quality-per-GB pick that runs comfortably on 8–16 GB GPUs. This page lists every GGUF quantization with its exact filename, file size and approximate VRAM, plus copy-paste download commands.
GGUF is the format used by Ollama, llama.cpp, LM Studio and KoboldCpp. All files are published under the Apache 2.0 license — free for personal and commercial use, no Hugging Face account required.
Exact filenames and sizes from unsloth's official GGUF repo — copy the filename you need straight into your download command.
| Quant | Filename | Size | Min VRAM |
|---|---|---|---|
| Q4_K_M | gemma-4-E4B-it-Q4_K_M.gguf | 4.98 GB | ~6 GB |
| Q5_K_M | gemma-4-E4B-it-Q5_K_M.gguf | 5.48 GB | ~7 GB |
| Q8_0 | gemma-4-E4B-it-Q8_0.gguf | 8.19 GB | ~10 GB |
| BF16 | gemma-4-E4B-it-BF16.gguf | 15.1 GB | ~17 GB |
Sizes verified from unsloth's Hugging Face repo. Click to open the full file list: unsloth/gemma-4-E4B-it-GGUF
Download a single quant file directly (recommended — avoids pulling the whole repo):
pip install huggingface_hub
# Recommended Q4_K_M build (4.98 GB)
huggingface-cli download unsloth/gemma-4-E4B-it-GGUF \
--include "gemma-4-E4B-it-Q4_K_M.gguf"One command to pull and run — Ollama auto-selects an optimized quant:
ollama pull gemma4:e4b
ollama run gemma4:e4bRun straight from the Hugging Face repo with llama.cpp's built-in downloader:
./llama-cli -hf unsloth/gemma-4-E4B-it-GGUF:Q4_K_MQ4_K_M (4.98 GB) is the best default — it keeps ~93–95% of full quality while staying small enough for mainstream 8–16 GB GPUs. Choose Q5_K_M or Q8_0 if you have spare VRAM and want higher fidelity, or BF16 for full precision.
The Q4_K_M build is 4.98 GB. Larger quants range up the table to BF16. See the file-size table above for every quant.
Run `ollama pull gemma4:e4b` then `ollama run gemma4:e4b`. Ollama downloads an optimized GGUF automatically and sets up the chat template for you.
Yes. GGUF with llama.cpp or Ollama runs on CPU, GPU, or a hybrid split. CPU-only works but is slower; offloading layers to even a modest GPU speeds it up significantly.
No. Gemma 4 is Apache 2.0 and the GGUF files are publicly downloadable without an account or gated access.
Compare every variant (E2B, E4B, 26B MoE, 31B Dense) in all formats, or check hardware requirements before you download.