Skip to main content
Evaluate a trained model with quick, interactive inference or run a batch job against a dataset to get objective metrics. Review results, compare to a baseline, and export the best checkpoint for deployment.
1

Select a model

Pick a completed training job or a previously exported model.
Have the model source path and base model ID handy.
2

Choose evaluation mode

Start with single‑prompt inference to sanity‑check behavior, then move to batch evaluation on a labeled dataset.
3

Pick metrics

Select task‑appropriate metrics (e.g., BERTScore/ROUGE for generation, EM/F1 for QA, accuracy/F1 for classification).
4

Run and monitor

Launch the job and watch progress; large datasets take longer.
5

Review and compare

Inspect metrics and samples, compare against a baseline, and note failure patterns to guide the next training round.
6

Export for serving

When satisfied, export the model in your preferred format for deployment.
Next: see the Inference & Export guide for detailed setup and options.