Inference and Export Guide

Use this guide to run quick checks and batch evaluations, compare models side‑by‑side, and export the final model for deployment.

Run inference and evaluation

Select model(s)

Choose a trained or base model. Select two models to enable side‑by‑side comparisons.

Choose a mode

On the next screen, pick either Batch Inference (quick, lightweight checks) or Evaluation (full metrics on labeled data).

Load data

For Batch Inference: load a dataset and optionally limit the number of samples. For Evaluation: use a labeled eval split from preprocessing.

Run and monitor

Start the job and track progress. Large datasets take longer.

Review results

Inspect predictions vs. references (when available), compare models, and review summary metrics.

Running batch inference

Load a dataset.
Select how many samples to run.
Review predictions vs. references (if available) for quick “vibe checks.”

Batch inference is ideal for sanity checks before committing to a full evaluation.

Evaluation

Available Task Types:

conversation → BERTScore, ROUGE
qa → Exact match, BERTScore
summarization → ROUGE, BERTScore
translation → BLEU, METEOR
classification → Accuracy, Precision, Recall, F1
general → BERTScore, ROUGE

Using Specific Metrics

For more control, specify exact metrics to compute: Available Metrics:

bertscore: Semantic similarity (⭐ Recommended for LLMs)
rouge: Text overlap and summarization quality
exact_match: Perfect string matching
accuracy: Token-level accuracy
precision, recall, f1: Classification metrics
bleu, meteor: Translation metrics

Model export

Export your fine-tuned models in various formats for different deployment scenarios.

Export Formats

Adapter Format
Merged Format
GGUF Format

Best for: LoRA/QLoRA models, experimentation, combining adaptersCharacteristics:

Small file size (few MB)
Requires base model to run
Easy to merge with other adapters
Good for A/B testing different fine-tunings

Configuration:

{
  "export_id": "exp_123",
  "job_id": "job_456",
  "type": "adapter",
  "destination": ["gcs", "hf_hub"],
  "hf_repo_id": "username/my-adapter"
}

Export Destinations

Google Cloud Storage

Best for: Downloading models, GCP deployments

Download as zip files
Direct integration with GCP services
Good for private model storage

Hugging Face Hub

Best for: Sharing models, public deployment

Publish to HF Hub for sharing
Easy integration with HF ecosystem
Good for open-source projects

GGUF Quantization Options

Choose the right quantization level for your needs:

q8_0 (Recommended)
q4_k_m
f16

Best for: Most use cases, good balance of quality and efficiency

8-bit quantization
Good quality retention
Reasonable file size
Fast inference

Inference providers

When running inference and evaluation on Facet AI, you can select from multiple providers optimized for different use cases.

Use the HF provider for non‑Unsloth models.
Use the Unsloth provider for models trained with Unsloth.
If you plan to deploy with vLLM, choose the vLLM provider to keep behavior consistent across environments.

Best practices

Inference Testing

Evaluation Strategy

Export Strategy

Troubleshooting

Common Issues

Inference Fails

Model not found or invalid model format
Insufficient resources available

Retry later or deploy on your cloud if you need guaranteed capacity.

Next Steps

After testing and exporting your model:

Deploy Your Model: Set up your model for production use
Monitor Performance: Track model performance in production
Collect Feedback: Gather user feedback to improve your model
Iterate: Use insights to refine your model with additional training

Ready to deploy? Head to the Deployment guide to learn how to set up your model for production use.

Getting started

Dataset preprocessing

Fine-tuning

Evaluation & Export

Model deployment

Inference and Export Guide

Run inference and evaluation

Running batch inference

Evaluation

Using Specific Metrics

Model export

Export Formats

Export Destinations

Google Cloud Storage

Hugging Face Hub

GGUF Quantization Options

Inference providers

Best practices

Inference Testing

Evaluation Strategy

Export Strategy

Troubleshooting

Common Issues

Next Steps

Getting started

Dataset preprocessing

Fine-tuning

Evaluation & Export

Model deployment

​Run inference and evaluation

​Running batch inference

​Evaluation

​Using Specific Metrics

​Model export

​Export Formats

​Export Destinations

Google Cloud Storage

Hugging Face Hub

​GGUF Quantization Options

​Inference providers

​Best practices

​Inference Testing

​Evaluation Strategy

​Export Strategy

​Troubleshooting

​Common Issues

​Next Steps

Run inference and evaluation

Running batch inference

Evaluation

Using Specific Metrics

Model export

Export Formats

Export Destinations

GGUF Quantization Options

Inference providers

Best practices

Inference Testing

Evaluation Strategy

Export Strategy

Troubleshooting

Common Issues

Next Steps