Preprocessing Workflow

Turn raw text or multimodal data into training‑ready conversations in a few steps. You can upload files (CSV, JSON/JSONL, Parquet, Excel) or import from Hugging Face, and optionally include images for vision‑language tasks. Choose a processing mode that matches your objective (language modeling, prompt‑only, or preference tuning) and produce a clean, consistent dataset for training.

Pick your data source

Upload a file or import a Hugging Face dataset by repository name.

Supports text and image data for multimodal training.

Map fields to roles

Point columns/keys to conversation roles (system, user, assistant) or to chosen/rejected pairs for preference data.

Choose processing mode

Select Language Modeling, Prompt‑only, or Preference Tuning to shape the output format.

Preview and validate

Inspect a sample of the processed dataset and fix any mapping issues.

Confirm examples look correct before running the full job.

Process and version

Run the job to generate a training‑ready dataset you can reuse across experiments.

Next: open the Datasets page to create your first processed dataset.

Getting started

Dataset preprocessing

Fine-tuning

Evaluation & Export

Model deployment

Preprocessing Workflow