Selecting Your Base Model
Model Size Considerations
- Small Models (270M-1B)
- Medium Models (4B-12B)
- Large Models (27B)
Best for: Simple tasks, quick prototyping, resource-constrained environments
- Gemma 3 270M: Fastest, good for basic text generation
- Gemma 3 1B: Balanced performance, good for most tasks
Choosing Your Training Method
Supervised Fine-Tuning (SFT)
When to use: General conversation, instruction following, domain adaptation Data format: Language modelingBest Practices
Best Practices
- Use high-quality conversation data
- Include diverse examples
- Ensure consistent formatting
- Balance different types of interactions
Example Use Cases
Example Use Cases
- Customer service chatbots
- Code assistants
- Educational tutors
- Domain-specific Q&A systems
Direct Preference Optimization (DPO)
When to use: Aligning models with human preferences, safety, helpfulness Data format: Preference tuning (chosen vs rejected responses)Data Requirements
Data Requirements
- Paired examples (chosen vs rejected)
- Clear preference signals
- Diverse preference types
- High-quality annotations
Example Use Cases
Example Use Cases
- Safety alignment
- Helpfulness optimization
- Style preference learning
- Response quality improvement
Odds Ratio Preference Optimization (ORPO)
When to use: Alternative to DPO, preference learning Data format: Same as DPO (chosen vs rejected responses)Unlike DPO, you do not need to perform SFT prior to ORPO training.
Group-Related Policy Optimization (GRPO)
When to use: Reasoning tasks, math problems, structured thinking Data format: Prompt-only (no assistant responses)Reward Functions
Reward Functions
- Built-in: Expression accuracy, numerical accuracy, format checking
- Reference-based: String comparison, text similarity
- Model-based: LLM scoring, classification, relative ranking
Example Use Cases
Example Use Cases
- Math problem solving
- Logical reasoning
- Code generation
- Scientific problem solving
PEFT Configuration
Full Fine-tuning
When to use: Large datasets, major domain changes, maximum performance Pros:- Updates all parameters
- Maximum learning capacity
- Best for major domain shifts
- Requires more memory
- Longer training time
- Higher compute costs
LoRA (Low-Rank Adaptation)
When to use: Most fine-tuning tasks, good balance of performance and efficiency Pros:- Memory efficient
- Faster training
- Good performance
- Easy to merge
- Slightly lower capacity than full fine-tuning
- May need higher rank for complex tasks
Quantization Configuration
QLoRA (Quantized LoRA)
When to use: Resource-constrained environments, most use cases Pros:- Most memory efficient
- Good performance
- Fast training
- Recommended default
- Slight quantization overhead
- May need fine-tuning for optimal results
Hyperparameter Configuration
Essential Parameters
- Learning Rate
- Batch Size
- Epochs
Purpose: Controls how much the model updates during trainingRecommended values:
- Full fine-tuning: 0.00005
- LoRA: 0.0001
- QLoRA: 0.0002
Lower learning rates are more stable but may require more epochs.
Advanced Parameters
Gradient Accumulation
Gradient Accumulation
Purpose: Simulate larger batch sizes without using more memoryFormula: Effective batch size = batch_size × gradient_accumulation_stepsExample: batch_size=2, gradient_accumulation_steps=4 → effective batch_size=8
Learning Rate Scheduler
Learning Rate Scheduler
Purpose: Adjusts learning rate during training Options: -
linear:
Linear decay from initial to 0 - cosine: Cosine annealing - constant:
Fixed learning rate Recommendation: Use linear for most casesSequence Length
Sequence Length
Purpose: Maximum length of input sequencesConsiderations:
- Longer sequences use more memory
- Should match your data’s typical length
- Common values: 1024, 2048, 4096
Evaluation Configuration
Note that this evaluation is different from the inference/evaluation service used post-training. This calculates token-level metrics and raw eval loss. For most use cases, you can disable this to speed up training and use the inference service for evaluation after training.Evaluation Strategies
no: No evaluation (fastest training)steps: Evaluate every N stepsepoch: Evaluate at the end of each epoch
Evaluation Metrics
The system automatically computes:- Accuracy: Token-level accuracy
- Perplexity: Model’s confidence in predictions
For task-specific metrics, use the inference service’s evaluation endpoint
after training.
Monitoring Training Progress
Weights & Biases Integration
Track your training with W&B for detailed monitoring by providing your API key.Trackio
We are working on this :)Export Configuration
- Adapter Format
- Merged Format
- GGUF Format
Best for: LoRA/QLoRA models, easy to merge later
- Smaller file size
- Requires base model to run
- Easy to combine with other adapters
Troubleshooting
Common Issues
Out of Memory (OOM)
Out of Memory (OOM)
Solutions:
- Reduce batch size
- Increase gradient accumulation steps
- Use QLoRA instead of LoRA
- Reduce sequence length
- Use a smaller model
Training Loss Not Decreasing
Training Loss Not Decreasing
Possible causes: - Learning rate too high or too low - Poor data quality -
Incorrect data format - Model too small for task Solutions: - Adjust
learning rate - Check data quality - Verify data format - Try larger model
Overfitting
Overfitting
Signs:
- Training loss much lower than validation loss
- Validation loss increasing
- Reduce epochs
- Increase dropout
- Use more diverse data
- Early stopping
What’s Next?
After your training completes:- Evaluate Performance: Use the evaluation tools to assess your model
- Export Your Model: Download in your preferred format
- Deploy: Set up your model for production use
- Iterate: Use feedback to improve with additional training