CLI Reference
Automate, script, and integrate. Built for professionals who prefer the terminal.
curator
Start the server with the Web UI.
Terminal
curatorOpens the Web UI at http://localhost:3333.
curator import
Import datasets from local files. Supports JSON, JSONL, and CSV formats. Auto-detects format from file extension and content structure.
Terminal
# Import a JSON file into dataset 1 curator import data.json --dataset 1 # Import with parallel workers for large files curator import massive-dataset.jsonl --dataset 1 --workers 8 # Import into a specific dataset with initial status curator import data.json --dataset 2 --status approvedcurator export
Export curated data for fine-tuning or RAG pipelines. Supports filtering, splitting, and stratification.
Terminal
# Export dataset 2 in MLX format (Apple Silicon training) curator export --dataset 2 --format mlx --output train.jsonl # Export with quality filter curator export --dataset 3 --format unsloth --output train.jsonl \ --filter "status=approved AND quality>=4" # Export with train/test/validation split curator export --split "0.8,0.1,0.1" --seed 42 --format jsonl \ --output dataset # Export for Unsloth (faster, memory-efficient) curator export --dataset 2 --format unsloth --output train.jsonl| Format | Flag | Fine-Tuning | RAG |
|---|---|---|---|
| Alpaca | --format alpaca | Standard instruction format | — |
| ShareGPT | --format sharegpt | Multi-turn conversations | — |
| JSONL | --format jsonl | Pipeline-ready streaming | Embedding pipeline input |
| CSV | --format csv | Analysis / spreadsheets | Document metadata |
| MLX | --format mlx | Apple Silicon (MLX-LM) | — |
| Unsloth | --format unsloth | Fast, memory-efficient | — |
| TRL | --format trl | HuggingFace ecosystem | — |
curator search
Search Hugging Face and Kaggle for datasets matching your query.
Terminal
curator search "python programming" curator search "medical qa" --limit 5curator download
Download and auto-import datasets from external sources.
Terminal
# Download from Hugging Face curator download hf:openai/summarize_from_feedback --dataset 3 # Full pipeline: search → download → curate → export curator search "medical qa" --limit 5 curator download hf:medalpaca/medical_meadow_small --dataset 1 curator export --dataset 1 --format mlx --output medical-training.jsonl \ --filter "quality>=4"curator clear
Clear a dataset with confirmation prompt.
Terminal
curator clear --dataset 1curator reset
Reset the database to a fresh state. Removes all datasets and samples.
Terminal
curator resetQuery Language
The --filter flag supports a SQL-like query syntax for precise dataset slicing.
Terminal
# Approved samples only with quality >= 4 curator export --dataset 1 --filter "status=approved AND quality>=4" # Approved samples in the "coding" category curator export --dataset 1 --filter "status=approved AND category=coding" # Full-text search within instruction field curator export --dataset 1 --filter "instruction:python" # Combine conditions curator export --dataset 1 \ --filter "status=approved AND quality>=3 AND category=support"| Filter | Description |
|---|---|
status=X | Filter by draft, in_review, approved, rejected |
quality>=N | Minimum star rating (1–5) |
category=X | Filter by category name |
instruction:X | Full-text search in instruction field |