Training Configuration Guide
This comprehensive guide walks you through every aspect of training configuration, from selecting the right model to monitoring your training progress.Prerequisites
Before starting your training job, ensure you have:Selecting Your Base Model
Model Size Considerations
Best for: Simple tasks, quick prototyping, resource-constrained environments
- Gemma 3 270M: Fastest, good for basic text generation
- Gemma 3 1B: Balanced performance, good for most tasks
Start with 1B for most applications. It provides good performance while being resource-efficient.
Vision Models
For multimodal tasks involving images:- Gemma 3n E2B: 2B parameters, good for basic vision tasks
- Gemma 3n E4B: 4B parameters, advanced vision understanding
Vision models automatically detect when your dataset contains images and
enable multimodal processing.
Choosing Your Training Method
Supervised Fine-Tuning (SFT)
When to use: General conversation, instruction following, domain adaptation Data format: Conversations with system, user, and assistant messages Configuration:Best Practices
Best Practices
- Use high-quality conversation data
- Include diverse examples
- Ensure consistent formatting
- Balance different types of interactions
Example Use Cases
Example Use Cases
- Customer service chatbots
- Code assistants
- Educational tutors
- Domain-specific Q&A systems
Direct Preference Optimization (DPO)
When to use: Aligning models with human preferences, safety, helpfulness Data format: Paired examples with chosen and rejected responses Configuration:Data Requirements
Data Requirements
- Paired examples (chosen vs rejected)
- Clear preference signals
- Diverse preference types
- High-quality annotations
Example Use Cases
Example Use Cases
- Safety alignment
- Helpfulness optimization
- Style preference learning
- Response quality improvement
Odds Ratio Preference Optimization (ORPO)
When to use: Alternative to DPO, preference learning Data format: Same as DPO (chosen vs rejected responses) Configuration:ORPO can be more stable than DPO in some cases and may converge faster.
Group-Related Policy Optimization (GRPO)
When to use: Reasoning tasks, math problems, structured thinking Data format: Prompts without responses (model generates and gets scored) Configuration:Reward Functions
Reward Functions
- Built-in: Expression accuracy, numerical accuracy, format checking
- Reference-based: String comparison, text similarity
- Model-based: LLM scoring, classification, relative ranking
Example Use Cases
Example Use Cases
- Math problem solving
- Logical reasoning
- Code generation
- Scientific problem solving
PEFT Configuration
Full Fine-tuning
When to use: Large datasets, major domain changes, maximum performance Pros:- Updates all parameters
- Maximum learning capacity
- Best for major domain shifts
- Requires more memory
- Longer training time
- Higher compute costs
LoRA (Low-Rank Adaptation)
When to use: Most fine-tuning tasks, good balance of performance and efficiency Pros:- Memory efficient
- Faster training
- Good performance
- Easy to merge
- Slightly lower capacity than full fine-tuning
- May need higher rank for complex tasks
QLoRA (Quantized LoRA)
When to use: Resource-constrained environments, most use cases Pros:- Most memory efficient
- Good performance
- Fast training
- Recommended default
- Slight quantization overhead
- May need fine-tuning for optimal results
Start with QLoRA for most tasks. It provides excellent results while using
minimal resources.
Hyperparameter Configuration
Essential Parameters
Purpose: Controls how much the model updates during trainingRecommended values:
- Full fine-tuning: 0.00005
- LoRA: 0.0001
- QLoRA: 0.0002
Lower learning rates are more stable but may require more epochs.
Advanced Parameters
Gradient Accumulation
Gradient Accumulation
Purpose: Simulate larger batch sizes without using more memoryFormula: Effective batch size = batch_size × gradient_accumulation_stepsExample: batch_size=2, gradient_accumulation_steps=4 → effective batch_size=8
Learning Rate Scheduler
Learning Rate Scheduler
Purpose: Adjusts learning rate during training Options: -
linear
:
Linear decay from initial to 0 - cosine
: Cosine annealing - constant
:
Fixed learning rate Recommendation: Use linear
for most casesSequence Length
Sequence Length
Purpose: Maximum length of input sequencesConsiderations:
- Longer sequences use more memory
- Should match your data’s typical length
- Common values: 1024, 2048, 4096
Evaluation Configuration
Setting Up Evaluation
Enable evaluation during training to monitor performance:Evaluation Strategies
no
: No evaluation (fastest training)steps
: Evaluate every N stepsepoch
: Evaluate at the end of each epoch
Use
epoch
evaluation for most cases. It provides regular feedback without
slowing training significantly.Evaluation Metrics
The system automatically computes:- Accuracy: Token-level accuracy
- Perplexity: Model’s confidence in predictions
For task-specific metrics, use the inference service’s evaluation endpoint
after training.
Monitoring Training Progress
Weights & Biases Integration
Track your training with W&B for detailed monitoring:Training Metrics to Watch
Early Stopping
Monitor validation loss and stop training if:- Validation loss stops decreasing
- Validation loss starts increasing (overfitting)
- Training loss becomes much lower than validation loss
Export Configuration
Export Formats
Configure how your model will be exported:Export Options
Best for: LoRA/QLoRA models, easy to merge later
- Smaller file size
- Requires base model to run
- Easy to combine with other adapters
Complete Training Configuration Example
Here’s a complete example for training a code assistant:Troubleshooting
Common Issues
Out of Memory (OOM)
Out of Memory (OOM)
Solutions:
- Reduce batch size
- Increase gradient accumulation steps
- Use QLoRA instead of LoRA
- Reduce sequence length
- Use a smaller model
Training Loss Not Decreasing
Training Loss Not Decreasing
Possible causes: - Learning rate too high or too low - Poor data quality -
Incorrect data format - Model too small for task Solutions: - Adjust
learning rate - Check data quality - Verify data format - Try larger model
Overfitting
Overfitting
Signs:
- Training loss much lower than validation loss
- Validation loss increasing
- Reduce epochs
- Increase dropout
- Use more diverse data
- Early stopping
What’s Next?
After your training completes:- Evaluate Performance: Use the evaluation tools to assess your model
- Export Your Model: Download in your preferred format
- Deploy: Set up your model for production use
- Iterate: Use feedback to improve with additional training