Run inference and evaluation
1
Select model(s)
Choose a trained or base model. Select two models to enable side‑by‑side comparisons.
2
Choose a mode
On the next screen, pick either Batch Inference (quick, lightweight checks) or Evaluation (full metrics on labeled data).
3
Load data
For Batch Inference: load a dataset and optionally limit the number of samples. For Evaluation: use a labeled eval split from preprocessing.
4
Run and monitor
Start the job and track progress. Large datasets take longer.
5
Review results
Inspect predictions vs. references (when available), compare models, and review summary metrics.
Running batch inference
- Load a dataset.
- Select how many samples to run.
- Review predictions vs. references (if available) for quick “vibe checks.”
Evaluation
Available Task Types:conversation→ BERTScore, ROUGEqa→ Exact match, BERTScoresummarization→ ROUGE, BERTScoretranslation→ BLEU, METEORclassification→ Accuracy, Precision, Recall, F1general→ BERTScore, ROUGE
Using Specific Metrics
For more control, specify exact metrics to compute: Available Metrics:bertscore: Semantic similarity (⭐ Recommended for LLMs)rouge: Text overlap and summarization qualityexact_match: Perfect string matchingaccuracy: Token-level accuracyprecision,recall,f1: Classification metricsbleu,meteor: Translation metrics
Model export
Export your fine-tuned models in various formats for different deployment scenarios.Export Formats
- Adapter Format
- Merged Format
- GGUF Format
Best for: LoRA/QLoRA models, experimentation, combining adaptersCharacteristics:
- Small file size (few MB)
- Requires base model to run
- Easy to merge with other adapters
- Good for A/B testing different fine-tunings
Export Destinations
Google Cloud Storage
Best for: Downloading models, GCP deployments
- Download as zip files
- Direct integration with GCP services
- Good for private model storage
Hugging Face Hub
Best for: Sharing models, public deployment
- Publish to HF Hub for sharing
- Easy integration with HF ecosystem
- Good for open-source projects
GGUF Quantization Options
Choose the right quantization level for your needs:- q8_0 (Recommended)
- q4_k_m
- f16
Best for: Most use cases, good balance of quality and efficiency
- 8-bit quantization
- Good quality retention
- Reasonable file size
- Fast inference
Inference providers
When running inference and evaluation on Facet AI, you can select from multiple providers optimized for different use cases.- Use the HF provider for non‑Unsloth models.
- Use the Unsloth provider for models trained with Unsloth.
- If you plan to deploy with vLLM, choose the vLLM provider to keep behavior consistent across environments.
Best practices
Inference Testing
Evaluation Strategy
Export Strategy
Troubleshooting
Common Issues
Inference Fails
Inference Fails
- Model not found or invalid model format
- Insufficient resources available
Next Steps
After testing and exporting your model:- Deploy Your Model: Set up your model for production use
- Monitor Performance: Track model performance in production
- Collect Feedback: Gather user feedback to improve your model
- Iterate: Use insights to refine your model with additional training