Unsloth is an open-source library that makes fine-tuning LLMs up to 2x faster while using 60% less memory. It achieves this through custom CUDA kernels and optimized training loops — with zero accuracy loss compared to standard training.
Gemma 4 is fully supported in Unsloth, including all four variants (E2B, E4B, 26B MoE, 31B). This guide covers installation, dataset preparation, training configuration, and exporting your fine-tuned model.
Custom Triton kernels optimize attention, MLP, and embedding layers. Fine-tuning that takes 10 hours with standard methods takes ~5 hours with Unsloth.
Intelligent gradient checkpointing and memory management let you fine-tune larger models on smaller GPUs. The E4B model can be fine-tuned on a single RTX 3090.
Unsloth's optimizations are mathematically equivalent to standard training. You get the same model quality with less compute — no approximations or trade-offs.
Export fine-tuned models to GGUF (for Ollama/llama.cpp), SafeTensors (for vLLM), or push directly to Hugging Face — all with one command.
Install Unsloth with pip. Requires Python 3.10+ and PyTorch 2.0+:
pip install unslothA minimal example to fine-tune Gemma 4 E4B with LoRA on your own dataset:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="google/gemma-4-e4b-it",
max_seq_length=4096,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model, r=16, lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
# Train with your dataset
from trl import SFTTrainer
trainer = SFTTrainer(
model=model, tokenizer=tokenizer,
train_dataset=dataset,
max_seq_length=4096,
)
trainer.train()Unsloth supports multiple dataset formats for fine-tuning Gemma 4:
Conversations with user/assistant turns. Best for chatbot and assistant fine-tuning.
Raw text for continued pre-training or domain adaptation.
Chosen/rejected pairs for preference-based training.
pages.unsloth.unslothPage.hardware.desc
| pages.unsloth.unslothPage.hardware.headers.model | pages.unsloth.unslothPage.hardware.headers.gpu | pages.unsloth.unslothPage.hardware.headers.time |
|---|---|---|
| E2B LoRA | RTX 3060 (12 GB) | ~15 min / 1K steps |
| E4B LoRA | RTX 4060 Ti (16 GB) | ~25 min / 1K steps |
| E4B QLoRA | RTX 3060 (12 GB) | ~30 min / 1K steps |
| 27B MoE LoRA | RTX 4090 (24 GB) | ~60 min / 1K steps |
| 27B MoE QLoRA | RTX 4070 Ti (16 GB) | ~90 min / 1K steps |
After fine-tuning, export to your preferred format:
# Save to GGUF for Ollama
model.save_pretrained_gguf("gemma4-custom", tokenizer, quantization_method="q4_k_m")
# Save to SafeTensors for vLLM
model.save_pretrained_merged("gemma4-custom-merged", tokenizer)
# Push to Hugging Face
model.push_to_hub_merged("your-username/gemma4-custom", tokenizer)Unsloth is an open-source fine-tuning library that makes LLM training 2x faster and uses 60% less memory through custom CUDA kernels. It supports Gemma 4, Llama, Mistral, and other popular model families.
Yes. With Unsloth's QLoRA 4-bit, you can fine-tune E4B on an RTX 4060 (8GB). LoRA requires an RTX 3090 (24GB). Larger models need professional GPUs (A100/H100) or cloud instances.
LoRA (Low-Rank Adaptation) adds small trainable matrices to the model while keeping base weights frozen. QLoRA additionally quantizes the base model to 4-bit, dramatically reducing memory. Both produce similar quality results.
For domain adaptation, 1K-10K high-quality examples are often sufficient. For instruction tuning, 5K-50K conversation pairs work well. Quality matters more than quantity — 1K excellent examples beats 100K noisy ones.
Yes. Unsloth supports merging LoRA weights into the base model for deployment without the adapter overhead. Export as a single merged model in GGUF or SafeTensors format.
Yes, Unsloth supports fine-tuning the Gemma 4 26B A4B MoE model. Due to the MoE architecture, LoRA is typically applied to the shared layers and expert routing, requiring more VRAM than dense models of similar active parameter count.
pages.unsloth.unslothPage.faq.items.6.a
pages.unsloth.unslothPage.faq.items.7.a
pages.unsloth.unslothPage.faq.items.8.a
pages.unsloth.unslothPage.faq.items.9.a
Install Unsloth, prepare your dataset, and create a custom Gemma 4 model in hours.