Open Source MIT License Deploy Anywhere Beta

Your data. Your infrastructure.
Own your intelligence.

Open source dataset management for fine-tuning and RAG. Web UI, CLI, REST API, Live Capture API — four ways to work. Local-first, deploy anywhere. Your data, your machine, your rules.

Capture. Curate. Export. Better data in. Better models out. Better answers retrieved.

Local-first. Deploy anywhere.

Runs on your Mac by default. Deploy to Docker, your server, or your cloud — same open source app, your infrastructure. No telemetry. No phone-home. Your data, your rules.

Your infrastructure. Your data.

Run on your Mac. Deploy to Docker, your server, or your cloud. Same open source app. No telemetry. No phone-home.

Own your security

Open source, as-is, early beta. Read the code. Audit it yourself. Automated scans on every commit. You deploy it, you secure it, you own it.

CLI, Web UI, REST API, or Live Capture — you choose

Visual curation, terminal automation, programmatic review, or real-time streaming. Four interfaces, one tool.

If it can POST, it can feed AI Curator

Real-time capture from any tool via HTTP API. Slack, logs, IDEs, custom scripts — any source.

No sample ships without your approval

Review, rate, approve or reject. Star ratings, categories, tags, duplicate detection.

7 formats. Fine-tuning or RAG.

Alpaca, ShareGPT, JSONL, CSV, MLX, Unsloth, TRL. Export for training or retrieval. Smart splitting included.

Same curation. Different destinations.

Whether you're fine-tuning a model or building a RAG retrieval system, the data preparation is identical. Capture, curate, export — you decide the destination at export time.

Fine-Tuning

Curate → Train

Collect instruction-response pairs. Review quality. Approve the best samples. Export as Alpaca, MLX, Unsloth, ShareGPT, or TRL for training.

  • Instruction-output pairs
  • Quality ratings & categories
  • Stratified train/test splits
RAG

Curate → Retrieve

Import documents. Deduplicate, review, approve. Export clean, structured content to your embedding pipeline. No stale or duplicated chunks in your vector store.

  • Document deduplication
  • Staleness & quality review
  • JSONL / CSV / Markdown export

Your best data is happening right now

Every support ticket, code review, and AI conversation contains data gold — for training or for retrieval. AI Curator captures it in real-time, before it's lost. If it can POST, it can feed the pipeline.

💬

Slack

Conversations become training data or knowledge base entries.

📋

Application Logs

Error-resolution pairs and user interactions, captured automatically.

🔌

OpenWebUI

Official plugin for self-hosted AI chat conversations.

💻

IDEs & VS Code

Code explanations and debug sessions, streaming as they happen.

📄

Internal Docs

Confluence, Notion, wikis — import and curate for RAG retrieval.

Custom Scripts

Any tool that can send JSON via HTTP POST can feed AI Curator.

Live Capture API
 curl -X POST http://localhost:3333/api/capture \ -H "Content-Type: application/json" \ -d '{ "source": "my-ide", "records": [{ "instruction": "Explain this error", "output": "The error occurs because...", "category": "coding", "qualityRating": 5 }] }'

No sample ships without your approval

Every sample goes through your review — whether it's training data or knowledge base content. Because you know what "good" looks like for your model and your users.

Draft
Awaiting review
In Review
Being evaluated
Approved
Ready to ship
5
Star ratings
Rank quality from 1 to 5
7
Export formats
For fine-tuning and RAG pipelines
0
Data leaks
Everything stays on your machine

Four ways to work

Click through the Web UI. Script from the terminal. Automate with the REST API. Stream data in real-time. Use one or use all four.

01 — Web UI

Visual Curation

For when you need to see what you're working with. Drag, click, review, export.

  • Drag & drop import
  • Card-based sample review
  • One-click export
  • Visual dashboards
02 — CLI

Power Automation

For when you have 10,000 samples and a deadline. Automate, script, integrate.

  • Bulk import/export
  • HuggingFace search & download
  • Advanced filtering & splitting
  • Scriptable workflows
03 — REST API

Programmatic Curation

Automate review, rate, approve, reject. Connect pipelines, scripts, or AI agents to the curation loop.

  • Auto-approve quality thresholds
  • Filter by status, category, rating
  • CI/CD quality gates
  • AI-assisted pre-review
04 — Live Capture

Real-Time Streaming

For when your best data is happening right now. Stream from any source via HTTP.

  • Real-time HTTP ingestion
  • Webhook integrations
  • Log processors
  • Custom script support
75 samples free No account needed

EdukaAI Starter Pack

75 engine-generated, ElGap-validated samples. Download free from ai-curator.cloud/starter-pack — no account needed. Import into EdukaAI Studio for fine-tuning, or use with AI Curator for the full curation workflow.

  • Player roleplays — Chen Wei, Diego Rodriguez, Marco Esposito
  • Tactical analysis — Match breakdowns and formation analysis
  • Fan perspectives — Emotional reactions from both sides
  • Commentary transcripts — Professional match narration
  • Alternate history — "What if" scenarios exploring different outcomes
  • Engine-generated, human-validated by ElGap
  • Download standalone — no AI Curator installation required
Learn more about the Starter Pack
Terminal
 # Install and start brew tap elgap/tap brew install ai-curator curator # Free Starter Pack: ai-curator.cloud/starter-pack # Export for fine-tuning curator export --dataset 1 \ --format mlx --output train.jsonl # Export for RAG indexing curator export --dataset 1 \ --format jsonl --output knowledge.jsonl
Apple Silicon

Fine-tune on your Mac in 5 minutes

Download the free Starter Pack from ai-curator.cloud, import into EdukaAI Studio, click Train. No GPU, no cloud, no code.

Step 1

Starter Pack

75 free samples, download from ai-curator.cloud. No account needed. Import directly into Studio.

Get the Starter Pack
Step 2

Train

EdukaAI Studio handles fine-tuning. Import the pack, pick a model, click Train. Runs on any M-series Mac — no GPU needed.

EdukaAI Studio
Step 3

Test

Dual Chat compares your fine-tuned model against the original. Same prompt, both models, side by side. See the difference your data made.

Build with it

Fine-tune a model on your data. Build a RAG system over your documents. Run locally or deploy to your infrastructure.

Fine-Tuning

Developers

Turn IDE interactions and code reviews into coding assistants.

Fine-Tuning

Support Teams

Resolved tickets become Q&A training pairs.

RAG

Enterprise Knowledge

Curate clean, deduplicated documents for your RAG retrieval system.

RAG

Customer Support

Capture and review resolved tickets for real-time answer retrieval.

Fine-Tuning

Researchers

Clean, stratified datasets with documented methodology for publication.

RAG

Internal Docs

Import wikis and docs, remove duplicates and stale content, export to embedding pipelines.

Your data. Your infrastructure.
Own your intelligence.

Install in seconds. Run locally or deploy to your infrastructure. Capture, curate, export — for fine-tuning or RAG. Open source, as-is, early beta.