Run Gemma 4 with Ollama
Ollama is the fastest and simplest way to run Gemma 4 on your own hardware. With a single command, you can download and start chatting with any Gemma 4 model variant — no Python environment, no complex setup, no GPU configuration required.
Ollama automatically detects your hardware (CPU, GPU, memory) and optimizes the model configuration for best performance. It supports macOS, Linux, and Windows, and provides an OpenAI-compatible API for easy integration into your applications.
Step 1: Install Ollama
macOS
Download from ollama.com or install via Homebrew:
# Homebrew
brew install ollama
# Or download from https://ollama.com/download/macLinux
One-line install script:
curl -fsSL https://ollama.com/install.sh | shWindows
Download the installer from ollama.com or use winget:
# winget
winget install Ollama.Ollama
# Or download from https://ollama.com/download/windowsVerify installation:
ollama --versionStep 2: Choose Your Gemma 4 Model
All Gemma 4 variants are available in the Ollama library. Choose based on your hardware and needs:
Ultra-lightweight for edge devices and basic tasks
Best balance of quality and resource usage
MoE architecture — large model quality at small model cost
Maximum quality — flagship dense model
Step 3: Run Gemma 4
Start an interactive chat session:
# Start interactive chat with Gemma 4 E4B
ollama run gemma4:e4b
# Or the flagship 31B model
ollama run gemma4:31bRun a single prompt:
ollama run gemma4:e4b "Explain quantum computing in simple terms"Use with images (multimodal):
# In the interactive chat, use /image to add images
ollama run gemma4:e4b
>>> /image photo.jpg What do you see in this image?Using the Ollama API
Ollama provides an OpenAI-compatible REST API at localhost:11434, making it easy to integrate Gemma 4 into your applications:
Chat completion:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma4:e4b",
"messages": [
{"role": "user", "content": "Hello, Gemma 4!"}
]
}'Text generation:
curl http://localhost:11434/api/generate \
-d '{
"model": "gemma4:e4b",
"prompt": "Write a Python function to sort a list"
}'Advanced Configuration
Custom Modelfile
Create a custom Modelfile to adjust model parameters like temperature, context length, and system prompt:
FROM gemma4:e4b
PARAMETER temperature 0.7
PARAMETER num_ctx 32768
SYSTEM """
You are a helpful coding assistant. Always provide code examples.
"""GPU Configuration
Ollama auto-detects GPUs, but you can control GPU layer offloading:
# Set number of GPU layers
OLLAMA_NUM_GPU=35 ollama run gemma4:31b
# CPU only mode
OLLAMA_NUM_GPU=0 ollama run gemma4:e4bContext Length
Increase the default context window for longer conversations:
ollama run gemma4:e4b --num-ctx 65536Troubleshooting
Model download is slow
Ollama downloads from ollama.com CDN. If slow, check your internet connection or try a VPN. Large models (26B, 31B) may take 10-30 minutes depending on bandwidth.
Out of memory error
Try a smaller model variant or a quantized version. Use 'ollama run gemma4:e4b' instead of the 31B model. On systems with limited RAM, close other applications before running.
Slow inference speed
Ensure Ollama is using your GPU: check with 'ollama ps'. On Mac, Ollama uses Metal GPU acceleration automatically. On Linux/Windows, ensure NVIDIA or AMD GPU drivers are properly installed.
API connection refused
Make sure the Ollama service is running: 'ollama serve'. The default API endpoint is http://localhost:11434. Check firewall settings if accessing from another machine.
Ollama + Gemma 4 FAQ
What is the best Gemma 4 model to run with Ollama?
For most users, gemma4:e4b offers the best balance of quality and performance. If you have a GPU with 16GB+ VRAM, gemma4:26b provides near-flagship quality with efficient MoE inference. The gemma4:31b model requires 24GB+ VRAM but delivers maximum performance.
Can I run Gemma 4 on Ollama without a GPU?
Yes. Ollama supports CPU-only inference for all Gemma 4 variants. The E2B and E4B models run reasonably fast on CPU. Larger models will be significantly slower without GPU acceleration but still functional.
How do I update Gemma 4 in Ollama?
Run 'ollama pull gemma4:e4b' (or your preferred variant) to download the latest version. Ollama will only download the differences if you already have a previous version installed.
Can I use Ollama Gemma 4 with other tools?
Yes. Ollama's OpenAI-compatible API works with most AI tools and frameworks including LangChain, LlamaIndex, Open WebUI, Continue.dev, and many others. Just point them to http://localhost:11434.
Does Ollama support Gemma 4 multimodal features?
Yes. Ollama supports Gemma 4's multimodal capabilities. You can pass images to the model using the /image command in the interactive chat or via the API's image parameter.
How much disk space does Gemma 4 require in Ollama?
Disk space depends on the variant: E2B (~1.5GB), E4B (~3GB), 26B MoE (~15GB), 31B Dense (~18GB). These are for the default quantization. Models are stored in ~/.ollama/models on macOS/Linux.
ollamaGuide.faq.items.6.q
ollamaGuide.faq.items.6.a
ollamaGuide.faq.items.7.q
ollamaGuide.faq.items.7.a
ollamaGuide.faq.items.8.q
ollamaGuide.faq.items.8.a
ollamaGuide.faq.items.9.q
ollamaGuide.faq.items.9.a
Ready to Run Gemma 4?
Install Ollama and start chatting with Gemma 4 in minutes. Or explore other deployment options.