Skip to main content
This comprehensive guide walks you through every aspect of training configuration, from selecting the right model to monitoring your training progress.

Selecting Your Base Model

Model Size Considerations

  • Small Models (270M-1B)
  • Medium Models (4B-12B)
  • Large Models (27B)
Best for: Simple tasks, quick prototyping, resource-constrained environments
  • Gemma 3 270M: Fastest, good for basic text generation
  • Gemma 3 1B: Balanced performance, good for most tasks
Start with 1B for most applications. It provides good performance while being resource-efficient. These models do not support vision fine-tuning!

Choosing Your Training Method

Supervised Fine-Tuning (SFT)

When to use: General conversation, instruction following, domain adaptation Data format: Language modeling
  • Use high-quality conversation data
  • Include diverse examples
  • Ensure consistent formatting
  • Balance different types of interactions
  • Customer service chatbots
  • Code assistants
  • Educational tutors
  • Domain-specific Q&A systems

Direct Preference Optimization (DPO)

When to use: Aligning models with human preferences, safety, helpfulness Data format: Preference tuning (chosen vs rejected responses)
  • Paired examples (chosen vs rejected)
  • Clear preference signals
  • Diverse preference types
  • High-quality annotations
  • Safety alignment
  • Helpfulness optimization
  • Style preference learning
  • Response quality improvement

Odds Ratio Preference Optimization (ORPO)

When to use: Alternative to DPO, preference learning Data format: Same as DPO (chosen vs rejected responses)
Unlike DPO, you do not need to perform SFT prior to ORPO training.
When to use: Reasoning tasks, math problems, structured thinking Data format: Prompt-only (no assistant responses)
  • Built-in: Expression accuracy, numerical accuracy, format checking
  • Reference-based: String comparison, text similarity
  • Model-based: LLM scoring, classification, relative ranking
  • Math problem solving
  • Logical reasoning
  • Code generation
  • Scientific problem solving

PEFT Configuration

Full Fine-tuning

When to use: Large datasets, major domain changes, maximum performance Pros:
  • Updates all parameters
  • Maximum learning capacity
  • Best for major domain shifts
Cons:
  • Requires more memory
  • Longer training time
  • Higher compute costs

LoRA (Low-Rank Adaptation)

When to use: Most fine-tuning tasks, good balance of performance and efficiency Pros:
  • Memory efficient
  • Faster training
  • Good performance
  • Easy to merge
Cons:
  • Slightly lower capacity than full fine-tuning
  • May need higher rank for complex tasks

Quantization Configuration

QLoRA (Quantized LoRA)

When to use: Resource-constrained environments, most use cases Pros:
  • Most memory efficient
  • Good performance
  • Fast training
  • Recommended default
Cons:
  • Slight quantization overhead
  • May need fine-tuning for optimal results
Start with QLoRA for most tasks. It provides excellent results while using minimal resources.
Note that we currently only support 4 bit quantization for QLoRA. We will be moving away from bitsandbytes in the future to more robust methods, stay tuned.

Hyperparameter Configuration

Essential Parameters

  • Learning Rate
  • Batch Size
  • Epochs
Purpose: Controls how much the model updates during trainingRecommended values:
  • Full fine-tuning: 0.00005
  • LoRA: 0.0001
  • QLoRA: 0.0002
Lower learning rates are more stable but may require more epochs.

Advanced Parameters

Purpose: Simulate larger batch sizes without using more memoryFormula: Effective batch size = batch_size × gradient_accumulation_stepsExample: batch_size=2, gradient_accumulation_steps=4 → effective batch_size=8
Purpose: Adjusts learning rate during training Options: - linear: Linear decay from initial to 0 - cosine: Cosine annealing - constant: Fixed learning rate Recommendation: Use linear for most cases
Purpose: Maximum length of input sequencesConsiderations:
  • Longer sequences use more memory
  • Should match your data’s typical length
  • Common values: 1024, 2048, 4096
Recommendation: Start with 2048, adjust based on your data

Evaluation Configuration

Note that this evaluation is different from the inference/evaluation service used post-training. This calculates token-level metrics and raw eval loss. For most use cases, you can disable this to speed up training and use the inference service for evaluation after training.

Evaluation Strategies

  • no: No evaluation (fastest training)
  • steps: Evaluate every N steps
  • epoch: Evaluate at the end of each epoch
Use epoch evaluation for most cases. It provides regular feedback without slowing training significantly.

Evaluation Metrics

The system automatically computes:
  • Accuracy: Token-level accuracy
  • Perplexity: Model’s confidence in predictions
For task-specific metrics, use the inference service’s evaluation endpoint after training.

Monitoring Training Progress

Weights & Biases Integration

Track your training with W&B for detailed monitoring by providing your API key.

Trackio

We are working on this :)

Export Configuration

  • Adapter Format
  • Merged Format
  • GGUF Format
Best for: LoRA/QLoRA models, easy to merge later
  • Smaller file size
  • Requires base model to run
  • Easy to combine with other adapters
Learn more about export options on the Inference, Evaluation, and Export page.

Troubleshooting

Common Issues

Solutions:
  • Reduce batch size
  • Increase gradient accumulation steps
  • Use QLoRA instead of LoRA
  • Reduce sequence length
  • Use a smaller model
Possible causes: - Learning rate too high or too low - Poor data quality - Incorrect data format - Model too small for task Solutions: - Adjust learning rate - Check data quality - Verify data format - Try larger model
Signs:
  • Training loss much lower than validation loss
  • Validation loss increasing
Solutions:
  • Reduce epochs
  • Increase dropout
  • Use more diverse data
  • Early stopping

What’s Next?

After your training completes:
  1. Evaluate Performance: Use the evaluation tools to assess your model
  2. Export Your Model: Download in your preferred format
  3. Deploy: Set up your model for production use
  4. Iterate: Use feedback to improve with additional training
Ready to evaluate your model? Head to the Evaluation guide to learn how to test your fine-tuned model.