CLI Reference

Automate, script, and integrate. Built for professionals who prefer the terminal.

curator

Start the server with the Web UI.

Terminal
 curator

Opens the Web UI at http://localhost:3333.

curator import

Import datasets from local files. Supports JSON, JSONL, and CSV formats. Auto-detects format from file extension and content structure.

Terminal
 # Import a JSON file into dataset 1 curator import data.json --dataset 1 # Import with parallel workers for large files curator import massive-dataset.jsonl --dataset 1 --workers 8 # Import into a specific dataset with initial status curator import data.json --dataset 2 --status approved

curator export

Export curated data for fine-tuning or RAG pipelines. Supports filtering, splitting, and stratification.

Terminal
 # Export dataset 2 in MLX format (Apple Silicon training) curator export --dataset 2 --format mlx --output train.jsonl # Export with quality filter curator export --dataset 3 --format unsloth --output train.jsonl \ --filter "status=approved AND quality>=4" # Export with train/test/validation split curator export --split "0.8,0.1,0.1" --seed 42 --format jsonl \ --output dataset # Export for Unsloth (faster, memory-efficient) curator export --dataset 2 --format unsloth --output train.jsonl
FormatFlagFine-TuningRAG
Alpaca--format alpacaStandard instruction format
ShareGPT--format sharegptMulti-turn conversations
JSONL--format jsonlPipeline-ready streamingEmbedding pipeline input
CSV--format csvAnalysis / spreadsheetsDocument metadata
MLX--format mlxApple Silicon (MLX-LM)
Unsloth--format unslothFast, memory-efficient
TRL--format trlHuggingFace ecosystem

curator download

Download and auto-import datasets from external sources.

Terminal
 # Download from Hugging Face curator download hf:openai/summarize_from_feedback --dataset 3 # Full pipeline: search → download → curate → export curator search "medical qa" --limit 5 curator download hf:medalpaca/medical_meadow_small --dataset 1 curator export --dataset 1 --format mlx --output medical-training.jsonl \ --filter "quality>=4"

curator clear

Clear a dataset with confirmation prompt.

Terminal
 curator clear --dataset 1

curator reset

Reset the database to a fresh state. Removes all datasets and samples.

Terminal
 curator reset

Query Language

The --filter flag supports a SQL-like query syntax for precise dataset slicing.

Terminal
 # Approved samples only with quality >= 4 curator export --dataset 1 --filter "status=approved AND quality>=4" # Approved samples in the "coding" category curator export --dataset 1 --filter "status=approved AND category=coding" # Full-text search within instruction field curator export --dataset 1 --filter "instruction:python" # Combine conditions curator export --dataset 1 \ --filter "status=approved AND quality>=3 AND category=support"
FilterDescription
status=XFilter by draft, in_review, approved, rejected
quality>=NMinimum star rating (1–5)
category=XFilter by category name
instruction:XFull-text search in instruction field