Gemma 4

Gemma 4 Presets & Configuration Guide

Getting the best output from Gemma 4 requires the right parameter configuration. Temperature, top-p, repetition penalty, and context length all significantly impact quality. This guide provides tested presets for common use cases so you can get optimal results immediately.

These presets work across all Gemma 4 inference tools — Ollama, LM Studio, vLLM, llama.cpp, and MLX. Adjust the values to match your specific needs.

Key Parameters Explained

Temperature

Range: 0.0 – 2.0Default: 0.7

Controls randomness in output. Lower values (0.1-0.3) produce more deterministic, focused responses. Higher values (0.8-1.2) increase creativity and variety. Values above 1.5 may produce incoherent output.

Top-P (Nucleus Sampling)

Range: 0.0 – 1.0Default: 0.9

Limits token selection to the smallest set of tokens whose cumulative probability exceeds P. Lower values (0.5-0.7) focus output; higher values (0.9-1.0) allow more diversity. Works in conjunction with temperature.

Top-K

Range: 1 – 100+Default: 40

Limits consideration to the top K most likely tokens. Lower values increase focus and consistency. Set to 1 for fully deterministic (greedy) output.

Repetition Penalty

Range: 1.0 – 2.0Default: 1.1

Penalizes token repetition to prevent loops and redundant output. Values around 1.05-1.15 work well for most use cases. Higher values may cause the model to avoid necessary repetitions.

Context Length (num_ctx)

Range: 2048 – 256000Default: 8192

Maximum number of tokens the model considers. Larger contexts enable processing longer documents but require more memory. Gemma 4 supports up to 128K (E2B/E4B) or 256K (26B/31B).

Max Tokens

Range: 1 – context limitDefault: 2048

Maximum number of tokens to generate in the response. Set higher for long-form content generation, lower for concise answers.

Recommended Presets

Coding & Technical

Optimized for code generation, debugging, and technical tasks. Low temperature for accuracy, high context for codebase understanding.

temperature0.2
topP0.85
topK20
repeatPenalty1.05
numCtx32768
maxTokens4096
System Prompt

You are an expert software engineer. Write clean, well-documented, production-quality code. Always include error handling and follow best practices for the language being used.

Creative Writing

Higher temperature for creative variety, with enough top-p to maintain coherence. Good for stories, marketing copy, and brainstorming.

temperature0.9
topP0.95
topK60
repeatPenalty1.15
numCtx16384
maxTokens8192
System Prompt

You are a talented creative writer. Write vivid, engaging content with strong narrative voice. Vary sentence structure and use evocative language.

Analysis & Research

Balanced settings for analytical tasks — document analysis, summarization, and research. Moderate temperature with long context for thorough analysis.

temperature0.3
topP0.9
topK30
repeatPenalty1.1
numCtx65536
maxTokens4096
System Prompt

You are a thorough analyst. Provide well-structured, evidence-based analysis. Cite specific details from the source material. Be objective and comprehensive.

General Chat & Assistant

Versatile preset for everyday interactions. Natural conversational tone with good balance between consistency and variety.

temperature0.7
topP0.9
topK40
repeatPenalty1.1
numCtx8192
maxTokens2048
System Prompt

You are a helpful, friendly AI assistant. Provide clear, accurate answers. Ask clarifying questions when needed. Be concise but thorough.

Roleplay & Character

High creativity with strong repetition penalty to maintain character consistency. Suitable for interactive fiction and character-based conversations.

temperature0.85
topP0.92
topK50
repeatPenalty1.18
numCtx16384
maxTokens4096
System Prompt

Stay in character at all times. Respond with vivid descriptions, emotional depth, and consistent personality. Never break the fourth wall.

Factual & Precise

Minimal randomness for tasks requiring accuracy — data extraction, classification, structured output, and factual Q&A.

temperature0.1
topP0.8
topK10
repeatPenalty1.05
numCtx8192
maxTokens2048
System Prompt

You are a precise, factual assistant. Provide accurate information only. If unsure, say so. Use structured formats (lists, tables) when appropriate.

Using Presets with Ollama

Create a custom Modelfile to apply a preset in Ollama:

# Create a Modelfile
cat > Modelfile.coding <<'EOF'
FROM gemma4:e4b

PARAMETER temperature 0.2
PARAMETER top_p 0.85
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 32768

SYSTEM """
You are an expert software engineer. Write clean, well-documented, production-quality code. Always include error handling and follow best practices for the language being used.
"""
EOF

# Build and run
ollama create gemma4-coding -f Modelfile.coding
ollama run gemma4-coding

Presets FAQ

What is the best temperature for Gemma 4?

It depends on the task: 0.1-0.3 for coding and factual tasks, 0.6-0.8 for general chat, 0.8-1.0 for creative writing. Start with the recommended preset for your use case and adjust based on output quality.

Should I use top-p or top-k?

Most users should use top-p (nucleus sampling) as it adapts better to different probability distributions. Top-k is simpler but can be too restrictive or too loose depending on the context. Using both together provides fine-grained control.

What context length should I set?

Use the smallest context that fits your needs — longer context uses more memory. 8K is fine for simple chats. 32K for code files. 64K+ for long documents. Only use 128K/256K when processing very large inputs.

How do I fix repetitive output?

Increase the repetition penalty (try 1.15-1.25). Also try increasing temperature slightly (add 0.1-0.2) and reducing top-k. If the model loops on specific phrases, add them to a presence penalty list if your tool supports it.

Do presets differ between model sizes?

The same presets generally work across all Gemma 4 variants. Smaller models (E2B, E4B) may benefit from slightly lower temperatures (subtract 0.1) to compensate for reduced model capacity. The 31B model handles higher temperatures well.

Can I use these presets with other models?

These presets are optimized for Gemma 4 but work as reasonable starting points for most LLMs. Different model families may respond differently to the same settings — always test and adjust.

presetsPage.faq.items.6.q

presetsPage.faq.items.6.a

presetsPage.faq.items.7.q

presetsPage.faq.items.7.a

presetsPage.faq.items.8.q

presetsPage.faq.items.8.a

presetsPage.faq.items.9.q

presetsPage.faq.items.9.a

Apply These Presets

Download Gemma 4 and start using these optimized configurations. Or try Gemma 4 online first.