Selecting Your Base Model
Model Size Considerations
- Small Models (270M-1B)
 - Medium Models (4B-12B)
 - Large Models (27B)
 
Best for: Simple tasks, quick prototyping, resource-constrained environments
- Gemma 3 270M: Fastest, good for basic text generation
 - Gemma 3 1B: Balanced performance, good for most tasks
 
Start with 1B for most applications. It provides good performance while being resource-efficient.
These models do not support vision fine-tuning!
Choosing Your Training Method
Supervised Fine-Tuning (SFT)
When to use: General conversation, instruction following, domain adaptation Data format: Language modelingBest Practices
Best Practices
- Use high-quality conversation data
 - Include diverse examples
 - Ensure consistent formatting
 - Balance different types of interactions
 
Example Use Cases
Example Use Cases
- Customer service chatbots
 - Code assistants
 - Educational tutors
 - Domain-specific Q&A systems
 
Direct Preference Optimization (DPO)
When to use: Aligning models with human preferences, safety, helpfulness Data format: Preference tuning (chosen vs rejected responses)Data Requirements
Data Requirements
- Paired examples (chosen vs rejected)
 - Clear preference signals
 - Diverse preference types
 - High-quality annotations
 
Example Use Cases
Example Use Cases
- Safety alignment
 - Helpfulness optimization
 - Style preference learning
 - Response quality improvement
 
Odds Ratio Preference Optimization (ORPO)
When to use: Alternative to DPO, preference learning Data format: Same as DPO (chosen vs rejected responses)Unlike DPO, you do not need to perform SFT prior to ORPO training.
Group-Related Policy Optimization (GRPO)
When to use: Reasoning tasks, math problems, structured thinking Data format: Prompt-only (no assistant responses)Reward Functions
Reward Functions
- Built-in: Expression accuracy, numerical accuracy, format checking
 - Reference-based: String comparison, text similarity
 - Model-based: LLM scoring, classification, relative ranking
 
Example Use Cases
Example Use Cases
- Math problem solving
 - Logical reasoning
 - Code generation
 - Scientific problem solving
 
PEFT Configuration
Full Fine-tuning
When to use: Large datasets, major domain changes, maximum performance Pros:- Updates all parameters
 - Maximum learning capacity
 - Best for major domain shifts
 
- Requires more memory
 - Longer training time
 - Higher compute costs
 
LoRA (Low-Rank Adaptation)
When to use: Most fine-tuning tasks, good balance of performance and efficiency Pros:- Memory efficient
 - Faster training
 - Good performance
 - Easy to merge
 
- Slightly lower capacity than full fine-tuning
 - May need higher rank for complex tasks
 
Quantization Configuration
QLoRA (Quantized LoRA)
When to use: Resource-constrained environments, most use cases Pros:- Most memory efficient
 - Good performance
 - Fast training
 - Recommended default
 
- Slight quantization overhead
 - May need fine-tuning for optimal results
 
Start with QLoRA for most tasks. It provides excellent results while using
minimal resources.
Hyperparameter Configuration
Essential Parameters
- Learning Rate
 - Batch Size
 - Epochs
 
Purpose: Controls how much the model updates during trainingRecommended values:
- Full fine-tuning: 0.00005
 - LoRA: 0.0001
 - QLoRA: 0.0002
 
Lower learning rates are more stable but may require more epochs.
Advanced Parameters
Gradient Accumulation
Gradient Accumulation
Purpose: Simulate larger batch sizes without using more memoryFormula: Effective batch size = batch_size × gradient_accumulation_stepsExample: batch_size=2, gradient_accumulation_steps=4 → effective batch_size=8
Learning Rate Scheduler
Learning Rate Scheduler
Purpose: Adjusts learning rate during training Options: - 
linear:
Linear decay from initial to 0 - cosine: Cosine annealing - constant:
Fixed learning rate Recommendation: Use linear for most casesSequence Length
Sequence Length
Purpose: Maximum length of input sequencesConsiderations:
- Longer sequences use more memory
 - Should match your data’s typical length
 - Common values: 1024, 2048, 4096
 
Evaluation Configuration
Note that this evaluation is different from the inference/evaluation service used post-training. This calculates token-level metrics and raw eval loss. For most use cases, you can disable this to speed up training and use the inference service for evaluation after training.Evaluation Strategies
no: No evaluation (fastest training)steps: Evaluate every N stepsepoch: Evaluate at the end of each epoch
Use 
epoch evaluation for most cases. It provides regular feedback without
slowing training significantly.Evaluation Metrics
The system automatically computes:- Accuracy: Token-level accuracy
 - Perplexity: Model’s confidence in predictions
 
For task-specific metrics, use the inference service’s evaluation endpoint
after training.
Monitoring Training Progress
Weights & Biases Integration
Track your training with W&B for detailed monitoring by providing your API key.Trackio
We are working on this :)Export Configuration
- Adapter Format
 - Merged Format
 - GGUF Format
 
Best for: LoRA/QLoRA models, easy to merge later
- Smaller file size
 - Requires base model to run
 - Easy to combine with other adapters
 
Troubleshooting
Common Issues
Out of Memory (OOM)
Out of Memory (OOM)
Solutions:
- Reduce batch size
 - Increase gradient accumulation steps
 - Use QLoRA instead of LoRA
 - Reduce sequence length
 - Use a smaller model
 
Training Loss Not Decreasing
Training Loss Not Decreasing
Possible causes: - Learning rate too high or too low - Poor data quality -
Incorrect data format - Model too small for task Solutions: - Adjust
learning rate - Check data quality - Verify data format - Try larger model
Overfitting
Overfitting
Signs:
- Training loss much lower than validation loss
 - Validation loss increasing
 
- Reduce epochs
 - Increase dropout
 - Use more diverse data
 - Early stopping
 
What’s Next?
After your training completes:- Evaluate Performance: Use the evaluation tools to assess your model
 - Export Your Model: Download in your preferred format
 - Deploy: Set up your model for production use
 - Iterate: Use feedback to improve with additional training