Running Single Inference
Test your model with individual prompts to see how it responds in real-time.Basic Text Inference
For simple text prompts, use the single inference endpoint:Conversation Inference
For multi-turn conversations, use the batch inference endpoint with a single conversation:Vision Inference
For multimodal tasks with images, include base64-encoded images:Batch Evaluation
Run comprehensive evaluations on your test datasets to get detailed performance metrics.Using Task Types (Recommended)
Let the system automatically select appropriate metrics for your task:conversation
→ BERTScore, ROUGEqa
→ Exact match, BERTScoresummarization
→ ROUGE, BERTScoretranslation
→ BLEU, METEORclassification
→ Accuracy, Precision, Recall, F1general
→ BERTScore, ROUGE
Using Specific Metrics
For more control, specify exact metrics to compute:bertscore
: Semantic similarity (⭐ Recommended for LLMs)rouge
: Text overlap and summarization qualityexact_match
: Perfect string matchingaccuracy
: Token-level accuracyprecision
,recall
,f1
: Classification metricsbleu
,meteor
: Translation metrics
Evaluation Response
Model Export
Export your fine-tuned models in various formats for different deployment scenarios.Export Formats
Best for: LoRA/QLoRA models, experimentation, combining adaptersCharacteristics:
- Small file size (few MB)
- Requires base model to run
- Easy to merge with other adapters
- Good for A/B testing different fine-tunings
Export Destinations
Google Cloud Storage
Best for: Downloading models, GCP deployments
- Download as zip files
- Direct integration with GCP services
- Good for private model storage
Hugging Face Hub
Best for: Sharing models, public deployment
- Publish to HF Hub for sharing
- Easy integration with HF ecosystem
- Good for open-source projects
GGUF Quantization Options
Choose the right quantization level for your needs:Best for: Most use cases, good balance of quality and efficiency
- 8-bit quantization
- Good quality retention
- Reasonable file size
- Fast inference
Inference Providers
Facet supports multiple inference backends for different use cases:HuggingFace Transformers
- Use case: Standard inference, most compatible
- Supports: All model types and modalities
- Best for: General use, testing, development
Unsloth
- Use case: Optimized inference for Unsloth-trained models
- Supports: Unsloth-optimized models
- Best for: Models trained with Unsloth framework
vLLM
- Use case: High-performance production inference
- Supports: Merged models and adapters (via LoRA)
- Best for: High-throughput production deployment
Best Practices
Inference Testing
Evaluation Strategy
Export Strategy
Troubleshooting
Common Issues
Inference Fails
Inference Fails
Possible causes:
- Invalid model source path
- Incorrect model type
- Missing base model ID
- Authentication issues
- Verify model source path and type
- Check base model ID matches training
- Ensure valid HuggingFace token
- Check model accessibility
Evaluation Errors
Evaluation Errors
Possible causes: - Dataset format mismatch - Missing reference data -
Invalid metric configuration - Memory issues Solutions: - Verify dataset
format and content - Check metric compatibility - Reduce batch size if OOM -
Use appropriate task type
Export Failures
Export Failures
Possible causes:
- Model not found
- Insufficient permissions
- Export format not supported
- Network issues
- Verify training job completed
- Check export permissions
- Use supported format for model type
- Retry with stable connection
Next Steps
After testing and exporting your model:- Deploy Your Model: Set up your model for production use
- Monitor Performance: Track model performance in production
- Collect Feedback: Gather user feedback to improve your model
- Iterate: Use insights to refine your model with additional training