Gemma 4 Presets & Configuration Guide
Getting the best output from Gemma 4 requires the right parameter configuration. Temperature, top-p, repetition penalty, and context length all significantly impact quality. This guide provides tested presets for common use cases so you can get optimal results immediately.
These presets work across all Gemma 4 inference tools — Ollama, LM Studio, vLLM, llama.cpp, and MLX. Adjust the values to match your specific needs.
Key Parameters Explained
Temperature
Controls randomness in output. Lower values (0.1-0.3) produce more deterministic, focused responses. Higher values (0.8-1.2) increase creativity and variety. Values above 1.5 may produce incoherent output.
Top-P (Nucleus Sampling)
Limits token selection to the smallest set of tokens whose cumulative probability exceeds P. Lower values (0.5-0.7) focus output; higher values (0.9-1.0) allow more diversity. Works in conjunction with temperature.
Top-K
Limits consideration to the top K most likely tokens. Lower values increase focus and consistency. Set to 1 for fully deterministic (greedy) output.
Repetition Penalty
Penalizes token repetition to prevent loops and redundant output. Values around 1.05-1.15 work well for most use cases. Higher values may cause the model to avoid necessary repetitions.
Context Length (num_ctx)
Maximum number of tokens the model considers. Larger contexts enable processing longer documents but require more memory. Gemma 4 supports up to 128K (E2B/E4B) or 256K (26B/31B).
Max Tokens
Maximum number of tokens to generate in the response. Set higher for long-form content generation, lower for concise answers.
Recommended Presets
Coding & Technical
Optimized for code generation, debugging, and technical tasks. Low temperature for accuracy, high context for codebase understanding.
You are an expert software engineer. Write clean, well-documented, production-quality code. Always include error handling and follow best practices for the language being used.
Creative Writing
Higher temperature for creative variety, with enough top-p to maintain coherence. Good for stories, marketing copy, and brainstorming.
You are a talented creative writer. Write vivid, engaging content with strong narrative voice. Vary sentence structure and use evocative language.
Analysis & Research
Balanced settings for analytical tasks — document analysis, summarization, and research. Moderate temperature with long context for thorough analysis.
You are a thorough analyst. Provide well-structured, evidence-based analysis. Cite specific details from the source material. Be objective and comprehensive.
General Chat & Assistant
Versatile preset for everyday interactions. Natural conversational tone with good balance between consistency and variety.
You are a helpful, friendly AI assistant. Provide clear, accurate answers. Ask clarifying questions when needed. Be concise but thorough.
Roleplay & Character
High creativity with strong repetition penalty to maintain character consistency. Suitable for interactive fiction and character-based conversations.
Stay in character at all times. Respond with vivid descriptions, emotional depth, and consistent personality. Never break the fourth wall.
Factual & Precise
Minimal randomness for tasks requiring accuracy — data extraction, classification, structured output, and factual Q&A.
You are a precise, factual assistant. Provide accurate information only. If unsure, say so. Use structured formats (lists, tables) when appropriate.
Using Presets with Ollama
Create a custom Modelfile to apply a preset in Ollama:
# Create a Modelfile
cat > Modelfile.coding <<'EOF'
FROM gemma4:e4b
PARAMETER temperature 0.2
PARAMETER top_p 0.85
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 32768
SYSTEM """
You are an expert software engineer. Write clean, well-documented, production-quality code. Always include error handling and follow best practices for the language being used.
"""
EOF
# Build and run
ollama create gemma4-coding -f Modelfile.coding
ollama run gemma4-codingPresets FAQ
What is the best temperature for Gemma 4?
It depends on the task: 0.1-0.3 for coding and factual tasks, 0.6-0.8 for general chat, 0.8-1.0 for creative writing. Start with the recommended preset for your use case and adjust based on output quality.
Should I use top-p or top-k?
Most users should use top-p (nucleus sampling) as it adapts better to different probability distributions. Top-k is simpler but can be too restrictive or too loose depending on the context. Using both together provides fine-grained control.
What context length should I set?
Use the smallest context that fits your needs — longer context uses more memory. 8K is fine for simple chats. 32K for code files. 64K+ for long documents. Only use 128K/256K when processing very large inputs.
How do I fix repetitive output?
Increase the repetition penalty (try 1.15-1.25). Also try increasing temperature slightly (add 0.1-0.2) and reducing top-k. If the model loops on specific phrases, add them to a presence penalty list if your tool supports it.
Do presets differ between model sizes?
The same presets generally work across all Gemma 4 variants. Smaller models (E2B, E4B) may benefit from slightly lower temperatures (subtract 0.1) to compensate for reduced model capacity. The 31B model handles higher temperatures well.
Can I use these presets with other models?
These presets are optimized for Gemma 4 but work as reasonable starting points for most LLMs. Different model families may respond differently to the same settings — always test and adjust.
presetsPage.faq.items.6.q
presetsPage.faq.items.6.a
presetsPage.faq.items.7.q
presetsPage.faq.items.7.a
presetsPage.faq.items.8.q
presetsPage.faq.items.8.a
presetsPage.faq.items.9.q
presetsPage.faq.items.9.a
Apply These Presets
Download Gemma 4 and start using these optimized configurations. Or try Gemma 4 online first.