Eval and Exports Workflow

Evaluate a trained model with quick, interactive inference or run a batch job against a dataset to get objective metrics. Review results, compare to a baseline, and export the best checkpoint for deployment.

Select a model

Pick a completed training job or a previously exported model.

Have the model source path and base model ID handy.

Choose evaluation mode

Start with single‑prompt inference to sanity‑check behavior, then move to batch evaluation on a labeled dataset.

Pick metrics

Select task‑appropriate metrics (e.g., BERTScore/ROUGE for generation, EM/F1 for QA, accuracy/F1 for classification).

Run and monitor

Launch the job and watch progress; large datasets take longer.

Review and compare

Inspect metrics and samples, compare against a baseline, and note failure patterns to guide the next training round.

Export for serving

When satisfied, export the model in your preferred format for deployment.

Next: see the Inference & Export guide for detailed setup and options.

Fine-tuning Guide Inference and Export Guide

⌘I

Getting started

Dataset preprocessing

Fine-tuning

Evaluation & Export

Model deployment

Eval and Exports Workflow