Skip to main content
Turn raw text or multimodal data into training‑ready conversations in a few steps. You can upload files (CSV, JSON/JSONL, Parquet, Excel) or import from Hugging Face, and optionally include images for vision‑language tasks. Choose a processing mode that matches your objective (language modeling, prompt‑only, or preference tuning) and produce a clean, consistent dataset for training.
1

Pick your data source

Upload a file or import a Hugging Face dataset by repository name.
Supports text and image data for multimodal training.
2

Map fields to roles

Point columns/keys to conversation roles (system, user, assistant) or to chosen/rejected pairs for preference data.
3

Choose processing mode

Select Language Modeling, Prompt‑only, or Preference Tuning to shape the output format.
4

Preview and validate

Inspect a sample of the processed dataset and fix any mapping issues.
Confirm examples look correct before running the full job.
5

Process and version

Run the job to generate a training‑ready dataset you can reuse across experiments.
Next: open the Datasets page to create your first processed dataset.
I