Open Source MIT License Deploy Anywhere Beta

Your data. Your infrastructure.
Own your intelligence.

Open source dataset management for fine-tuning and RAG. Web UI, CLI, REST API, Live Capture API — four ways to work. Local-first, deploy anywhere. Your data, your machine, your rules.

Capture. Curate. Export. Better data in. Better models out. Better answers retrieved.

Install Now View Source

Terminal

Local-first. Deploy anywhere.

Runs on your Mac by default. Deploy to Docker, your server, or your cloud — same open source app, your infrastructure. No telemetry. No phone-home. Your data, your rules.

Your infrastructure. Your data.

Run on your Mac. Deploy to Docker, your server, or your cloud. Same open source app. No telemetry. No phone-home.

Own your security

Open source, as-is, early beta. Read the code. Audit it yourself. Automated scans on every commit. You deploy it, you secure it, you own it.

CLI, Web UI, REST API, or Live Capture — you choose

Visual curation, terminal automation, programmatic review, or real-time streaming. Four interfaces, one tool.

If it can POST, it can feed AI Curator

Real-time capture from any tool via HTTP API. Slack, logs, IDEs, custom scripts — any source.

No sample ships without your approval

Review, rate, approve or reject. Star ratings, categories, tags, duplicate detection.

7 formats. Fine-tuning or RAG.

Alpaca, ShareGPT, JSONL, CSV, MLX, Unsloth, TRL. Export for training or retrieval. Smart splitting included.

Same curation. Different destinations.

Whether you're fine-tuning a model or building a RAG retrieval system, the data preparation is identical. Capture, curate, export — you decide the destination at export time.

Fine-Tuning

Curate → Train

Collect instruction-response pairs. Review quality. Approve the best samples. Export as Alpaca, MLX, Unsloth, ShareGPT, or TRL for training.

Instruction-output pairs
Quality ratings & categories
Stratified train/test splits

RAG

Curate → Retrieve

Import documents. Deduplicate, review, approve. Export clean, structured content to your embedding pipeline. No stale or duplicated chunks in your vector store.

Document deduplication
Staleness & quality review
JSONL / CSV / Markdown export

Your best data is happening right now

Every support ticket, code review, and AI conversation contains data gold — for training or for retrieval. AI Curator captures it in real-time, before it's lost. If it can POST, it can feed the pipeline.

💬

Slack

Conversations become training data or knowledge base entries.

📋

Application Logs

Error-resolution pairs and user interactions, captured automatically.

🔌

OpenWebUI

Official plugin for self-hosted AI chat conversations.

💻

IDEs & VS Code

Code explanations and debug sessions, streaming as they happen.

📄

Internal Docs

Confluence, Notion, wikis — import and curate for RAG retrieval.

⚡

Custom Scripts

Any tool that can send JSON via HTTP POST can feed AI Curator.

Live Capture API

 curl -X POST http://localhost:3333/api/capture \ -H "Content-Type: application/json" \ -d '{ "source": "my-ide", "records": [{ "instruction": "Explain this error", "output": "The error occurs because...", "category": "coding", "qualityRating": 5 }] }'

No sample ships without your approval

Every sample goes through your review — whether it's training data or knowledge base content. Because you know what "good" looks like for your model and your users.

Draft

Awaiting review

In Review

Being evaluated

Approved

Ready to ship

Star ratings

Rank quality from 1 to 5

Export formats

For fine-tuning and RAG pipelines

Data leaks

Everything stays on your machine

Four ways to work

Click through the Web UI. Script from the terminal. Automate with the REST API. Stream data in real-time. Use one or use all four.

01 — Web UI

Visual Curation

For when you need to see what you're working with. Drag, click, review, export.

Drag & drop import
Card-based sample review
One-click export
Visual dashboards

02 — CLI

Power Automation

For when you have 10,000 samples and a deadline. Automate, script, integrate.

Bulk import/export
HuggingFace search & download
Advanced filtering & splitting
Scriptable workflows

03 — REST API

Programmatic Curation

Automate review, rate, approve, reject. Connect pipelines, scripts, or AI agents to the curation loop.

Auto-approve quality thresholds
Filter by status, category, rating
CI/CD quality gates
AI-assisted pre-review

04 — Live Capture

Real-Time Streaming

For when your best data is happening right now. Stream from any source via HTTP.

Real-time HTTP ingestion
Webhook integrations
Log processors
Custom script support

75 samples free No account needed

EdukaAI Starter Pack

75 engine-generated, ElGap-validated samples. Download free from ai-curator.cloud/starter-pack — no account needed. Import into EdukaAI Studio for fine-tuning, or use with AI Curator for the full curation workflow.

Player roleplays — Chen Wei, Diego Rodriguez, Marco Esposito
Tactical analysis — Match breakdowns and formation analysis
Fan perspectives — Emotional reactions from both sides
Commentary transcripts — Professional match narration
Alternate history — "What if" scenarios exploring different outcomes
Engine-generated, human-validated by ElGap
Download standalone — no AI Curator installation required

Learn more about the Starter Pack

Terminal

 # Install and start brew tap elgap/tap brew install ai-curator curator # Free Starter Pack: ai-curator.cloud/starter-pack # Export for fine-tuning curator export --dataset 1 \ --format mlx --output train.jsonl # Export for RAG indexing curator export --dataset 1 \ --format jsonl --output knowledge.jsonl

Apple Silicon

Fine-tune on your Mac in 5 minutes

Download the free Starter Pack from ai-curator.cloud, import into EdukaAI Studio, click Train. No GPU, no cloud, no code.

Step 1

Starter Pack

75 free samples, download from ai-curator.cloud. No account needed. Import directly into Studio.

Get the Starter Pack

Step 2

Train

EdukaAI Studio handles fine-tuning. Import the pack, pick a model, click Train. Runs on any M-series Mac — no GPU needed.

EdukaAI Studio

Step 3

Test

Dual Chat compares your fine-tuned model against the original. Same prompt, both models, side by side. See the difference your data made.

See the 5-minute guide

Build with it

Fine-tune a model on your data. Build a RAG system over your documents. Run locally or deploy to your infrastructure.

Fine-Tuning

Developers

Turn IDE interactions and code reviews into coding assistants.

Fine-Tuning

Support Teams

Resolved tickets become Q&A training pairs.

RAG

Enterprise Knowledge

Curate clean, deduplicated documents for your RAG retrieval system.

RAG

Customer Support

Capture and review resolved tickets for real-time answer retrieval.

Fine-Tuning

Researchers

Clean, stratified datasets with documented methodology for publication.

RAG

Internal Docs

Import wikis and docs, remove duplicates and stale content, export to embedding pipelines.