Training Configuration Guide

This comprehensive guide walks you through every aspect of training configuration, from selecting the right model to monitoring your training progress.

Prerequisites

Before starting your training job, ensure you have:

Selecting Your Base Model

Model Size Considerations

Best for: Simple tasks, quick prototyping, resource-constrained environments
  • Gemma 3 270M: Fastest, good for basic text generation
  • Gemma 3 1B: Balanced performance, good for most tasks
Start with 1B for most applications. It provides good performance while being resource-efficient.

Vision Models

For multimodal tasks involving images:
  • Gemma 3n E2B: 2B parameters, good for basic vision tasks
  • Gemma 3n E4B: 4B parameters, advanced vision understanding
Vision models automatically detect when your dataset contains images and enable multimodal processing.

Choosing Your Training Method

Supervised Fine-Tuning (SFT)

When to use: General conversation, instruction following, domain adaptation Data format: Conversations with system, user, and assistant messages Configuration:
{
  "trainer_type": "sft",
  "processing_mode": "language_modeling"
}

Direct Preference Optimization (DPO)

When to use: Aligning models with human preferences, safety, helpfulness Data format: Paired examples with chosen and rejected responses Configuration:
{
  "trainer_type": "dpo",
  "processing_mode": "preference"
}

Odds Ratio Preference Optimization (ORPO)

When to use: Alternative to DPO, preference learning Data format: Same as DPO (chosen vs rejected responses) Configuration:
{
  "trainer_type": "orpo",
  "processing_mode": "preference"
}
ORPO can be more stable than DPO in some cases and may converge faster.
When to use: Reasoning tasks, math problems, structured thinking Data format: Prompts without responses (model generates and gets scored) Configuration:
{
  "trainer_type": "grpo",
  "processing_mode": "prompt_only",
  "reward_config": [
    {
      "name": "math_accuracy",
      "type": "numerical_accuracy",
      "reference_field": "answer"
    }
  ]
}

PEFT Configuration

Full Fine-tuning

When to use: Large datasets, major domain changes, maximum performance Pros:
  • Updates all parameters
  • Maximum learning capacity
  • Best for major domain shifts
Cons:
  • Requires more memory
  • Longer training time
  • Higher compute costs
Configuration:
{
  "method": "Full",
  "hyperparameters": {
    "learning_rate": 0.00005,
    "batch_size": 1,
    "gradient_accumulation_steps": 8
  }
}

LoRA (Low-Rank Adaptation)

When to use: Most fine-tuning tasks, good balance of performance and efficiency Pros:
  • Memory efficient
  • Faster training
  • Good performance
  • Easy to merge
Cons:
  • Slightly lower capacity than full fine-tuning
  • May need higher rank for complex tasks
Configuration:
{
  "method": "LoRA",
  "hyperparameters": {
    "lora_rank": 16,
    "lora_alpha": 16,
    "lora_dropout": 0.05,
    "learning_rate": 0.0001
  }
}

QLoRA (Quantized LoRA)

When to use: Resource-constrained environments, most use cases Pros:
  • Most memory efficient
  • Good performance
  • Fast training
  • Recommended default
Cons:
  • Slight quantization overhead
  • May need fine-tuning for optimal results
Configuration:
{
  "method": "QLoRA",
  "hyperparameters": {
    "lora_rank": 16,
    "lora_alpha": 16,
    "lora_dropout": 0.05,
    "learning_rate": 0.0002
  }
}
Start with QLoRA for most tasks. It provides excellent results while using minimal resources.

Hyperparameter Configuration

Essential Parameters

Purpose: Controls how much the model updates during trainingRecommended values:
  • Full fine-tuning: 0.00005
  • LoRA: 0.0001
  • QLoRA: 0.0002
Lower learning rates are more stable but may require more epochs.

Advanced Parameters

Evaluation Configuration

Setting Up Evaluation

Enable evaluation during training to monitor performance:
{
  "eval_config": {
    "eval_strategy": "epoch",
    "eval_steps": 50,
    "compute_eval_metrics": true,
    "batch_eval_metrics": false
  }
}

Evaluation Strategies

  • no: No evaluation (fastest training)
  • steps: Evaluate every N steps
  • epoch: Evaluate at the end of each epoch
Use epoch evaluation for most cases. It provides regular feedback without slowing training significantly.

Evaluation Metrics

The system automatically computes:
  • Accuracy: Token-level accuracy
  • Perplexity: Model’s confidence in predictions
For task-specific metrics, use the inference service’s evaluation endpoint after training.

Monitoring Training Progress

Weights & Biases Integration

Track your training with W&B for detailed monitoring:
{
  "wandb_config": {
    "api_key": "your_wandb_key",
    "project_name": "gemma-fine-tuning",
    "run_name": "my-experiment"
  }
}

Training Metrics to Watch

Early Stopping

Monitor validation loss and stop training if:
  • Validation loss stops decreasing
  • Validation loss starts increasing (overfitting)
  • Training loss becomes much lower than validation loss

Export Configuration

Export Formats

Configure how your model will be exported:
{
  "export_config": {
    "format": "adapter",
    "destination": "gcs",
    "include_gguf": false,
    "gguf_quantization": "q8_0"
  }
}

Export Options

Best for: LoRA/QLoRA models, easy to merge later
  • Smaller file size
  • Requires base model to run
  • Easy to combine with other adapters
Learn more about export options on the Inference, Evaluation, and Export page.

Complete Training Configuration Example

Here’s a complete example for training a code assistant:
{
  "base_model_id": "google/gemma-2b",
  "provider": "unsloth",
  "method": "QLoRA",
  "trainer_type": "sft",
  "modality": "text",
  "hyperparameters": {
    "learning_rate": 0.0002,
    "batch_size": 4,
    "gradient_accumulation_steps": 4,
    "epochs": 3,
    "max_length": 2048,
    "lora_rank": 16,
    "lora_alpha": 16,
    "lora_dropout": 0.05
  },
  "export_config": {
    "format": "adapter",
    "destination": "gcs",
    "include_gguf": true,
    "gguf_quantization": "q8_0"
  },
  "eval_config": {
    "eval_strategy": "epoch",
    "compute_eval_metrics": true
  },
  "wandb_config": {
    "project_name": "code-assistant-training"
  }
}

Troubleshooting

Common Issues

What’s Next?

After your training completes:
  1. Evaluate Performance: Use the evaluation tools to assess your model
  2. Export Your Model: Download in your preferred format
  3. Deploy: Set up your model for production use
  4. Iterate: Use feedback to improve with additional training
Ready to evaluate your model? Head to the Evaluation guide to learn how to test your fine-tuned model.