Fine-tuning Guide

This comprehensive guide walks you through every aspect of training configuration, from selecting the right model to monitoring your training progress.

Selecting Your Base Model

Model Size Considerations

Small Models (270M-1B)
Medium Models (4B-12B)
Large Models (27B)

Best for: Simple tasks, quick prototyping, resource-constrained environments

Gemma 3 270M: Fastest, good for basic text generation
Gemma 3 1B: Balanced performance, good for most tasks

Start with 1B for most applications. It provides good performance while being resource-efficient. These models do not support vision fine-tuning!

Choosing Your Training Method

Supervised Fine-Tuning (SFT)

When to use: General conversation, instruction following, domain adaptation Data format: Language modeling

Best Practices

Use high-quality conversation data
Include diverse examples
Ensure consistent formatting
Balance different types of interactions

Example Use Cases

Customer service chatbots
Code assistants
Educational tutors
Domain-specific Q&A systems

Direct Preference Optimization (DPO)

When to use: Aligning models with human preferences, safety, helpfulness Data format: Preference tuning (chosen vs rejected responses)

Data Requirements

Paired examples (chosen vs rejected)
Clear preference signals
Diverse preference types
High-quality annotations

Example Use Cases

Safety alignment
Helpfulness optimization
Style preference learning
Response quality improvement

Odds Ratio Preference Optimization (ORPO)

When to use: Alternative to DPO, preference learning Data format: Same as DPO (chosen vs rejected responses)

Unlike DPO, you do not need to perform SFT prior to ORPO training.

When to use: Reasoning tasks, math problems, structured thinking Data format: Prompt-only (no assistant responses)

Reward Functions

Built-in: Expression accuracy, numerical accuracy, format checking
Reference-based: String comparison, text similarity
Model-based: LLM scoring, classification, relative ranking

Example Use Cases

Math problem solving
Logical reasoning
Code generation
Scientific problem solving

PEFT Configuration

Full Fine-tuning

When to use: Large datasets, major domain changes, maximum performance Pros:

Updates all parameters
Maximum learning capacity
Best for major domain shifts

Cons:

Requires more memory
Longer training time
Higher compute costs

LoRA (Low-Rank Adaptation)

When to use: Most fine-tuning tasks, good balance of performance and efficiency Pros:

Memory efficient
Faster training
Good performance
Easy to merge

Cons:

Slightly lower capacity than full fine-tuning
May need higher rank for complex tasks

Quantization Configuration

QLoRA (Quantized LoRA)

When to use: Resource-constrained environments, most use cases Pros:

Most memory efficient
Good performance
Fast training
Recommended default

Cons:

Slight quantization overhead
May need fine-tuning for optimal results

Start with QLoRA for most tasks. It provides excellent results while using minimal resources.

Note that we currently only support 4 bit quantization for QLoRA. We will be moving away from bitsandbytes in the future to more robust methods, stay tuned.

Hyperparameter Configuration

Essential Parameters

Learning Rate
Batch Size
Epochs

Purpose: Controls how much the model updates during trainingRecommended values:

Full fine-tuning: 0.00005
LoRA: 0.0001
QLoRA: 0.0002

Lower learning rates are more stable but may require more epochs.

Advanced Parameters

Gradient Accumulation

Purpose: Simulate larger batch sizes without using more memoryFormula: Effective batch size = batch_size × gradient_accumulation_stepsExample: batch_size=2, gradient_accumulation_steps=4 → effective batch_size=8

Learning Rate Scheduler

Purpose: Adjusts learning rate during training Options: - linear: Linear decay from initial to 0 - cosine: Cosine annealing - constant: Fixed learning rate Recommendation: Use linear for most cases

Sequence Length

Purpose: Maximum length of input sequencesConsiderations:

Longer sequences use more memory
Should match your data’s typical length
Common values: 1024, 2048, 4096

Recommendation: Start with 2048, adjust based on your data

Evaluation Configuration

Note that this evaluation is different from the inference/evaluation service used post-training. This calculates token-level metrics and raw eval loss. For most use cases, you can disable this to speed up training and use the inference service for evaluation after training.

Evaluation Strategies

no: No evaluation (fastest training)
steps: Evaluate every N steps
epoch: Evaluate at the end of each epoch

Use epoch evaluation for most cases. It provides regular feedback without slowing training significantly.

Evaluation Metrics

The system automatically computes:

Accuracy: Token-level accuracy
Perplexity: Model’s confidence in predictions

For task-specific metrics, use the inference service’s evaluation endpoint after training.

Monitoring Training Progress

Weights & Biases Integration

Track your training with W&B for detailed monitoring by providing your API key.

Trackio

We are working on this :)

Export Configuration

Adapter Format
Merged Format
GGUF Format

Best for: LoRA/QLoRA models, easy to merge later

Smaller file size
Requires base model to run
Easy to combine with other adapters

Learn more about export options on the Inference, Evaluation, and Export page.

Troubleshooting

Common Issues

Out of Memory (OOM)

Solutions:

Reduce batch size
Increase gradient accumulation steps
Use QLoRA instead of LoRA
Reduce sequence length
Use a smaller model

Training Loss Not Decreasing

Possible causes: - Learning rate too high or too low - Poor data quality - Incorrect data format - Model too small for task Solutions: - Adjust learning rate - Check data quality - Verify data format - Try larger model

Overfitting

Signs:

Training loss much lower than validation loss
Validation loss increasing

Solutions:

Reduce epochs
Increase dropout
Use more diverse data
Early stopping

What’s Next?

After your training completes:

Evaluate Performance: Use the evaluation tools to assess your model
Export Your Model: Download in your preferred format
Deploy: Set up your model for production use
Iterate: Use feedback to improve with additional training

Ready to evaluate your model? Head to the Evaluation guide to learn how to test your fine-tuned model.

Getting started

Dataset preprocessing

Fine-tuning

Evaluation & Export

Model deployment

Fine-tuning Guide

Selecting Your Base Model

Model Size Considerations

Choosing Your Training Method

Supervised Fine-Tuning (SFT)

Direct Preference Optimization (DPO)

Odds Ratio Preference Optimization (ORPO)

PEFT Configuration

Full Fine-tuning

LoRA (Low-Rank Adaptation)

Quantization Configuration

QLoRA (Quantized LoRA)

Hyperparameter Configuration

Essential Parameters

Advanced Parameters

Evaluation Configuration

Evaluation Strategies

Evaluation Metrics

Monitoring Training Progress

Weights & Biases Integration

Trackio

Export Configuration

Troubleshooting

Common Issues

What’s Next?

Getting started

Dataset preprocessing

Fine-tuning

Evaluation & Export

Model deployment

​Selecting Your Base Model

​Model Size Considerations

​Choosing Your Training Method

​Supervised Fine-Tuning (SFT)

​Direct Preference Optimization (DPO)

​Odds Ratio Preference Optimization (ORPO)

​Group-Related Policy Optimization (GRPO)

​PEFT Configuration

​Full Fine-tuning

​LoRA (Low-Rank Adaptation)

​Quantization Configuration

​QLoRA (Quantized LoRA)

​Hyperparameter Configuration

​Essential Parameters

​Advanced Parameters

​Evaluation Configuration

​Evaluation Strategies

​Evaluation Metrics

​Monitoring Training Progress

​Weights & Biases Integration

​Trackio

​Export Configuration

​Troubleshooting

​Common Issues

​What’s Next?

Selecting Your Base Model

Model Size Considerations

Choosing Your Training Method

Supervised Fine-Tuning (SFT)

Direct Preference Optimization (DPO)

Odds Ratio Preference Optimization (ORPO)

Group-Related Policy Optimization (GRPO)

PEFT Configuration

Full Fine-tuning

LoRA (Low-Rank Adaptation)

Quantization Configuration

QLoRA (Quantized LoRA)

Hyperparameter Configuration

Essential Parameters

Advanced Parameters

Evaluation Configuration

Evaluation Strategies

Evaluation Metrics

Monitoring Training Progress

Weights & Biases Integration

Trackio

Export Configuration

Troubleshooting

Common Issues

What’s Next?