Your data. Your infrastructure.
Own your intelligence.
Open source dataset management for fine-tuning and RAG. Web UI, CLI, REST API, Live Capture API — four ways to work. Local-first, deploy anywhere. Your data, your machine, your rules.
Capture. Curate. Export. Better data in. Better models out. Better answers retrieved.
Local-first. Deploy anywhere.
Runs on your Mac by default. Deploy to Docker, your server, or your cloud — same open source app, your infrastructure. No telemetry. No phone-home. Your data, your rules.
Your infrastructure. Your data.
Run on your Mac. Deploy to Docker, your server, or your cloud. Same open source app. No telemetry. No phone-home.
Own your security
Open source, as-is, early beta. Read the code. Audit it yourself. Automated scans on every commit. You deploy it, you secure it, you own it.
CLI, Web UI, REST API, or Live Capture — you choose
Visual curation, terminal automation, programmatic review, or real-time streaming. Four interfaces, one tool.
If it can POST, it can feed AI Curator
Real-time capture from any tool via HTTP API. Slack, logs, IDEs, custom scripts — any source.
No sample ships without your approval
Review, rate, approve or reject. Star ratings, categories, tags, duplicate detection.
7 formats. Fine-tuning or RAG.
Alpaca, ShareGPT, JSONL, CSV, MLX, Unsloth, TRL. Export for training or retrieval. Smart splitting included.
Same curation. Different destinations.
Whether you're fine-tuning a model or building a RAG retrieval system, the data preparation is identical. Capture, curate, export — you decide the destination at export time.
Curate → Train
Collect instruction-response pairs. Review quality. Approve the best samples. Export as Alpaca, MLX, Unsloth, ShareGPT, or TRL for training.
- Instruction-output pairs
- Quality ratings & categories
- Stratified train/test splits
Curate → Retrieve
Import documents. Deduplicate, review, approve. Export clean, structured content to your embedding pipeline. No stale or duplicated chunks in your vector store.
- Document deduplication
- Staleness & quality review
- JSONL / CSV / Markdown export
Your best data is happening right now
Every support ticket, code review, and AI conversation contains data gold — for training or for retrieval. AI Curator captures it in real-time, before it's lost. If it can POST, it can feed the pipeline.
Slack
Conversations become training data or knowledge base entries.
Application Logs
Error-resolution pairs and user interactions, captured automatically.
OpenWebUI
Official plugin for self-hosted AI chat conversations.
IDEs & VS Code
Code explanations and debug sessions, streaming as they happen.
Internal Docs
Confluence, Notion, wikis — import and curate for RAG retrieval.
Custom Scripts
Any tool that can send JSON via HTTP POST can feed AI Curator.
curl -X POST http://localhost:3333/api/capture \ -H "Content-Type: application/json" \ -d '{ "source": "my-ide", "records": [{ "instruction": "Explain this error", "output": "The error occurs because...", "category": "coding", "qualityRating": 5 }] }'No sample ships without your approval
Every sample goes through your review — whether it's training data or knowledge base content. Because you know what "good" looks like for your model and your users.
Four ways to work
Click through the Web UI. Script from the terminal. Automate with the REST API. Stream data in real-time. Use one or use all four.
Visual Curation
For when you need to see what you're working with. Drag, click, review, export.
- Drag & drop import
- Card-based sample review
- One-click export
- Visual dashboards
Power Automation
For when you have 10,000 samples and a deadline. Automate, script, integrate.
- Bulk import/export
- HuggingFace search & download
- Advanced filtering & splitting
- Scriptable workflows
Programmatic Curation
Automate review, rate, approve, reject. Connect pipelines, scripts, or AI agents to the curation loop.
- Auto-approve quality thresholds
- Filter by status, category, rating
- CI/CD quality gates
- AI-assisted pre-review
Real-Time Streaming
For when your best data is happening right now. Stream from any source via HTTP.
- Real-time HTTP ingestion
- Webhook integrations
- Log processors
- Custom script support
EdukaAI Starter Pack
75 engine-generated, ElGap-validated samples. Download free from ai-curator.cloud/starter-pack — no account needed. Import into EdukaAI Studio for fine-tuning, or use with AI Curator for the full curation workflow.
- Player roleplays — Chen Wei, Diego Rodriguez, Marco Esposito
- Tactical analysis — Match breakdowns and formation analysis
- Fan perspectives — Emotional reactions from both sides
- Commentary transcripts — Professional match narration
- Alternate history — "What if" scenarios exploring different outcomes
- Engine-generated, human-validated by ElGap
- Download standalone — no AI Curator installation required
# Install and start brew tap elgap/tap brew install ai-curator curator # Free Starter Pack: ai-curator.cloud/starter-pack # Export for fine-tuning curator export --dataset 1 \ --format mlx --output train.jsonl # Export for RAG indexing curator export --dataset 1 \ --format jsonl --output knowledge.jsonlFine-tune on your Mac in 5 minutes
Download the free Starter Pack from ai-curator.cloud, import into EdukaAI Studio, click Train. No GPU, no cloud, no code.
Starter Pack
75 free samples, download from ai-curator.cloud. No account needed. Import directly into Studio.
Get the Starter PackTrain
EdukaAI Studio handles fine-tuning. Import the pack, pick a model, click Train. Runs on any M-series Mac — no GPU needed.
EdukaAI StudioTest
Dual Chat compares your fine-tuned model against the original. Same prompt, both models, side by side. See the difference your data made.
Build with it
Fine-tune a model on your data. Build a RAG system over your documents. Run locally or deploy to your infrastructure.
Developers
Turn IDE interactions and code reviews into coding assistants.
Support Teams
Resolved tickets become Q&A training pairs.
Enterprise Knowledge
Curate clean, deduplicated documents for your RAG retrieval system.
Customer Support
Capture and review resolved tickets for real-time answer retrieval.
Researchers
Clean, stratified datasets with documented methodology for publication.
Internal Docs
Import wikis and docs, remove duplicates and stale content, export to embedding pipelines.
Your data. Your infrastructure.
Own your intelligence.
Install in seconds. Run locally or deploy to your infrastructure. Capture, curate, export — for fine-tuning or RAG. Open source, as-is, early beta.