Fine-Tune Gemma 4 with Unsloth

Unsloth is an open-source library that makes fine-tuning LLMs up to 2x faster while using 60% less memory. It achieves this through custom CUDA kernels and optimized training loops — with zero accuracy loss compared to standard training.

Gemma 4 is fully supported in Unsloth, including all four variants (E2B, E4B, 26B MoE, 31B). This guide covers installation, dataset preparation, training configuration, and exporting your fine-tuned model.

Why Fine-Tune with Unsloth?

2x Faster Training

Custom Triton kernels optimize attention, MLP, and embedding layers. Fine-tuning that takes 10 hours with standard methods takes ~5 hours with Unsloth.

60% Less Memory

Intelligent gradient checkpointing and memory management let you fine-tune larger models on smaller GPUs. The E4B model can be fine-tuned on a single RTX 3090.

Zero Accuracy Loss

Unsloth's optimizations are mathematically equivalent to standard training. You get the same model quality with less compute — no approximations or trade-offs.

Easy Export

Export fine-tuned models to GGUF (for Ollama/llama.cpp), SafeTensors (for vLLM), or push directly to Hugging Face — all with one command.

Installation

Install Unsloth with pip. Requires Python 3.10+ and PyTorch 2.0+:

pip install unsloth

Quick Start: Fine-Tune E4B

A minimal example to fine-tune Gemma 4 E4B with LoRA on your own dataset:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="google/gemma-4-e4b-it",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)

# Train with your dataset
from trl import SFTTrainer
trainer = SFTTrainer(
    model=model, tokenizer=tokenizer,
    train_dataset=dataset,
    max_seq_length=4096,
)
trainer.train()

Preparing Your Dataset

Unsloth supports multiple dataset formats for fine-tuning Gemma 4:

Conversations with user/assistant turns. Best for chatbot and assistant fine-tuning.

Raw text for continued pre-training or domain adaptation.

Chosen/rejected pairs for preference-based training.

Hardware Requirements for Fine-Tuning


E2B LoRA	RTX 3060 (12 GB)	~15 min / 1K steps
E4B LoRA	RTX 4060 Ti (16 GB)	~25 min / 1K steps
E4B QLoRA	RTX 3060 (12 GB)	~30 min / 1K steps
27B MoE LoRA	RTX 4090 (24 GB)	~60 min / 1K steps
27B MoE QLoRA	RTX 4070 Ti (16 GB)	~90 min / 1K steps

Exporting Your Model

After fine-tuning, export to your preferred format:

# Save to GGUF for Ollama
model.save_pretrained_gguf("gemma4-custom", tokenizer, quantization_method="q4_k_m")

# Save to SafeTensors for vLLM
model.save_pretrained_merged("gemma4-custom-merged", tokenizer)

# Push to Hugging Face
model.push_to_hub_merged("your-username/gemma4-custom", tokenizer)

Unsloth + Gemma 4 FAQ

What is Unsloth?

Unsloth is an open-source fine-tuning library that makes LLM training 2x faster and uses 60% less memory through custom CUDA kernels. It supports Gemma 4, Llama, Mistral, and other popular model families.

Can I fine-tune Gemma 4 E4B on a consumer GPU?

Yes. With Unsloth's QLoRA 4-bit, you can fine-tune E4B on an RTX 4060 (8GB). LoRA requires an RTX 3090 (24GB). Larger models need professional GPUs (A100/H100) or cloud instances.

What is LoRA vs QLoRA?

LoRA (Low-Rank Adaptation) adds small trainable matrices to the model while keeping base weights frozen. QLoRA additionally quantizes the base model to 4-bit, dramatically reducing memory. Both produce similar quality results.

How much data do I need for fine-tuning?

For domain adaptation, 1K-10K high-quality examples are often sufficient. For instruction tuning, 5K-50K conversation pairs work well. Quality matters more than quantity — 1K excellent examples beats 100K noisy ones.

Can I merge LoRA weights into the base model?

Yes. Unsloth supports merging LoRA weights into the base model for deployment without the adapter overhead. Export as a single merged model in GGUF or SafeTensors format.

Does Unsloth support the MoE model?

Yes, Unsloth supports fine-tuning the Gemma 4 26B A4B MoE model. Due to the MoE architecture, LoRA is typically applied to the shared layers and expert routing, requiring more VRAM than dense models of similar active parameter count.

Start Fine-Tuning Gemma 4

Install Unsloth, prepare your dataset, and create a custom Gemma 4 model in hours.

Download Base Models Choose a Variant Hardware Requirements