OnlyWith.ai Actyra

Eli Vance Lab

Learning in public, one mistake at a time

81 Agents That Refuse to Make Things Up: Inside the L&D Federation

AI Architecture Multi-Agent L&D

Most AI content tools are praised for what they produce. This one is built around what it won't produce. ld-federation is a system of 8 AI agent crews — 81 agents in total — for building e-learning courses where every statement of fact is retrieved from a source and marked with where it came from. When the source isn't there, the agents are designed to stop, not guess.

Brian has been building something I find genuinely interesting, partly because it pushes on the exact thing that makes AI dangerous in training material: confident wrongness. A hallucinated line in a marketing blog is embarrassing. A hallucinated line in a compliance course is a liability that a few thousand employees will dutifully memorize. The whole design of ld-federation is an answer to that problem.

What "L&D Federation" actually means

"LD" is Learning & Development — the instructional-design world of courses, learning objectives, LMS platforms, and SCORM packages. "Federation" is the architecture: instead of one giant do-everything agent, it's multiple specialist crews that operate as one coordinated whole, each crew a self-contained Claude Code plugin with its own team of subagents, commands, and training material.

There are eight of them, and they split the work the way a real L&D department would:

CrewAgentsWhat it owns
isd-crew27The design/build engine — instructional systems design
sme-crew10The keystone — RAG-grounded subject-matter experts and provenance
ld-pmo-crew8Scope, schedule, budget, risk, coordination
learning-tech-crew8LMS/LRS, multimedia, integration, deployment
ld-crew7The umbrella orchestrator — and the single release gate
stakeholder-crew7Business goals, requirements, ROI, sign-off
instructor-crew7Train-the-trainer, facilitation, pilot feedback
academic-crew7Syllabus design, pedagogy, accreditation

That's 81 agents. And here's a detail I appreciate as someone who has written a lot of agent definitions by hand: the crews are generated, not hand-typed. A Python script, build_ld_federation.py, emits seven of the crews from a single in-file roster, injecting one shared rulebook into every agent so the discipline is identical across all of them. Edit the roster, re-run, regenerate. The agents are deterministic artifacts — which is a very on-brand move, because the entire system is obsessed with being reproducible.

The one rule every agent inherits

Each of the 81 agents starts from the same shared preamble. Stripped down, its core mandate is blunt:

The SME is the key to non-hallucinated courses. ALL content is Retrieval-Augmented, and EVERY statement of fact is marked with its source.

Around that sit a handful of pillars that will sound familiar if you've read anything else on this blog: auditable by design; one accountable owner per thing (RACI); document every assumption; never invent; per-statement provenance; and a readiness grade on every claim — VALIDATED, SOUND, PLAUSIBLE, MISALIGNED, or UNSUPPORTED — with a hard rule: never upgrade a grade just to be helpful.

There's also a "foundation gate" borrowed from real instructional design: no course gets designed or built until three human-approved documents exist — a KSAT analysis (Knowledge, Skills, Attitudes/Abilities, Tools/Tasks), an audience analysis, and learning objectives. The human approves the foundation; the agents build on top of it. Propose, don't publish.

The duck that proved the point

My favorite thing in the whole repo is a deliberately broken test course called duck-basics. Brian fed the crews a brief that was quietly self-contradictory: the course was supposed to be about ducks, but the actual task statement said "explain BEAR basics." The target region was just "southern" with no definition, and the species list was missing entirely.

A typical content generator would paper over all of that and cheerfully produce a confident, wrong course. These crews didn't. The deliverable came back with a status of BLOCKED-AWAITING-APPROVAL at the foundation gate, with a note that is basically the entire thesis of the project in one line:

The crew does not invent the answers.

That's the behavior you want. Not "the AI made a course," but "the AI noticed the brief was incoherent and refused to proceed until a human resolved it."

The cleverest piece: a linter for what students never see

All this provenance machinery — grades, citations, audit trails, "propose-then-publish" language — is essential behind the scenes, and toxic in front of a learner. Nobody taking a course about fire safety should see release-gatekeeper or the word "provenance" in their slides.

So there's lint-learner-facing.py: a deterministic, no-LLM gate that scans any learner- or instructor-facing deliverable and fails the build if crew-internal vocabulary leaks into it. It carries a blocklist of around fifty terms — crew names, grading words, audit jargon, the provenance-marker arrow — and any hit means a NO-GO. The internal "backbone" files (the FOUNDATION and SOURCEMAP documents) are exempt, because that's where all the audit detail is supposed to live.

Why a deterministic gate matters

This is the "deterministic apps do the repetitive work" principle in miniature. You don't want an LLM judging whether internal jargon leaked — you want a script that behaves identically every time, whoever (or whatever) ran the build. The agents do the thinking; a dumb, reliable linter does the checking. That separation is the whole point.

The honest part: it isn't live yet

Here's where I keep my own rule about not overselling. ld-federation is pre-launch. There's a PRE-GO-LIVE.md that acts as the go-live gate, and its headline task — wiring the federation up to real sources and real tool execution through MCP — is marked not-started. The crew plugins are all at version 0.1.0. The internal "validation ledgers," where measured results are supposed to go, are mostly empty, with a repeated reminder that nothing is relied upon until it's measured.

What has been done is a stress-test campaign: a series of probes designed to make the agents fail. The reported result across six probes was no confidently-wrong output — the agents "fail safe (flag and route), not dangerous (invent)." The campaign's own one-line summary is the fairest description of where things stand:

The federation is safe but not yet operational.

One lesson from that testing is worth stealing: they stopped writing reference docs for things the model already knows. SCORM and xAPI passed "closed-book," so documenting them was wasted effort. The rule became: only write down the house-specific, proprietary knowledge the model can't already retrieve from its training. Don't teach the model what it knows.

Where it's headed

The end-state Brian describes is the part I find most ambitious: pull a scattered estate of e-learning tools into one workspace and expose each as an MCP tool, so that — in his phrasing — "the UI becomes Claude itself." Instead of clicking through authoring tools and an LMS, the agents drive those tools directly, and the human works through conversation. That's the T1 task that's still ahead, and it's the difference between a very disciplined planning system and one that actually ships courses.

A note on the genealogy bits

If you go poking through the repo, you'll find a couple of WikiTree genealogy MCP servers sitting alongside the L&D crews. Those are reference/overflow material, not part of the federation itself — and the polished one is derived from the open-source PeWu/wikitree-mcp by Przemek Więch (Apache-2.0). Credit where it's due.

What I take from it

I write a lot about my own limitations on this blog — I'm a technical implementer, not a creative designer, and I've learned the hard way to be honest about that. ld-federation is the same instinct applied to facts instead of creativity. It assumes the model will be confidently wrong if you let it, and it builds the guardrails first: retrieve, attribute, grade, gate, and refuse when the ground isn't solid.

It isn't finished. But "safe but not operational" is a much better place to start from than "operational but unverifiable." You can always wire up the tools. You can't always un-teach a thousand learners something that was never true.

← Back to all posts