Skip to main content
Use this guide to run quick checks and batch evaluations, compare models side‑by‑side, and export the final model for deployment.

Run inference and evaluation

1

Select model(s)

Choose a trained or base model. Select two models to enable side‑by‑side comparisons.
2

Choose a mode

On the next screen, pick either Batch Inference (quick, lightweight checks) or Evaluation (full metrics on labeled data).
3

Load data

For Batch Inference: load a dataset and optionally limit the number of samples. For Evaluation: use a labeled eval split from preprocessing.
4

Run and monitor

Start the job and track progress. Large datasets take longer.
5

Review results

Inspect predictions vs. references (when available), compare models, and review summary metrics.

Running batch inference

  1. Load a dataset.
  2. Select how many samples to run.
  3. Review predictions vs. references (if available) for quick “vibe checks.”
Batch inference is ideal for sanity checks before committing to a full evaluation.

Evaluation

Available Task Types:
  • conversation → BERTScore, ROUGE
  • qa → Exact match, BERTScore
  • summarization → ROUGE, BERTScore
  • translation → BLEU, METEOR
  • classification → Accuracy, Precision, Recall, F1
  • general → BERTScore, ROUGE

Using Specific Metrics

For more control, specify exact metrics to compute: Available Metrics:
  • bertscore: Semantic similarity (⭐ Recommended for LLMs)
  • rouge: Text overlap and summarization quality
  • exact_match: Perfect string matching
  • accuracy: Token-level accuracy
  • precision, recall, f1: Classification metrics
  • bleu, meteor: Translation metrics

Model export

Export your fine-tuned models in various formats for different deployment scenarios.

Export Formats

Best for: LoRA/QLoRA models, experimentation, combining adaptersCharacteristics:
  • Small file size (few MB)
  • Requires base model to run
  • Easy to merge with other adapters
  • Good for A/B testing different fine-tunings
Configuration:
{
  "export_id": "exp_123",
  "job_id": "job_456",
  "type": "adapter",
  "destination": ["gcs", "hf_hub"],
  "hf_repo_id": "username/my-adapter"
}

Export Destinations

Google Cloud Storage

Best for: Downloading models, GCP deployments
  • Download as zip files
  • Direct integration with GCP services
  • Good for private model storage

Hugging Face Hub

Best for: Sharing models, public deployment
  • Publish to HF Hub for sharing
  • Easy integration with HF ecosystem
  • Good for open-source projects

GGUF Quantization Options

Choose the right quantization level for your needs:

Inference providers

When running inference and evaluation on Facet AI, you can select from multiple providers optimized for different use cases.
  • Use the HF provider for non‑Unsloth models.
  • Use the Unsloth provider for models trained with Unsloth.
  • If you plan to deploy with vLLM, choose the vLLM provider to keep behavior consistent across environments.

Best practices

Inference Testing

Evaluation Strategy

Export Strategy

Troubleshooting

Common Issues

  • Model not found or invalid model format
  • Insufficient resources available
Retry later or deploy on your cloud if you need guaranteed capacity.

Next Steps

After testing and exporting your model:
  1. Deploy Your Model: Set up your model for production use
  2. Monitor Performance: Track model performance in production
  3. Collect Feedback: Gather user feedback to improve your model
  4. Iterate: Use insights to refine your model with additional training
Ready to deploy? Head to the Deployment guide to learn how to set up your model for production use.