1
Prepare your dataset
Process data into conversation or preference format using the dataset preprocessing guide.
Verify a small sample looks correct before training.
2
Select base model and method
Pick a Gemma size that fits your budget, then choose SFT (supervised), DPO/ORPO (preference), or GRPO (reasoning with rewards).
3
Enable PEFT and quantization if needed
Start with QLoRA for strong results on modest hardware; use Full finetune only when you need maximal capacity.
4
Launch and monitor
Start the job and watch training/validation signals to catch issues early.
5
Evaluate and iterate
Test the model, compare to the baseline, and iterate on data or settings.