A one-person lab building and stress-testing AI-agent memory and orchestration in production. Published in the open, with permanent DOIs. Receipts, not claims - including the failures most people never write up.
April 2026 · Systems · CC BY 4.0
PrimeAgentOrchestrator: Memory-Primed Agent Spawning for Personal AI Infrastructure
Coding agents start every session with an empty head. This one spawns fresh agents already loaded with the memory of everything that came before, pulled live from personal databases and fused across two backends at spawn time.
Documented over four months of real deployment and three redesigns of the delivery mechanism.
Reminisce: A Cognitive Science-Inspired Memory Architecture for AI Agents
A memory system for agents modeled on how human memory actually works: a working buffer, an episodic timeline, and a semantic layer that notices when new facts contradict old ones. Released open source and benchmarked on LongMemEvalS.
49.4% to 55.8% accuracy on LongMemEvalS across Claude tiers, with ~81% precision on attempted answers (98.3% peak on direct single-session recall). Open source, MIT, 606 tests.
The Placeholder That Became Production: A Postmortem of Extractive Memory Under Continuous Multi-Agent Ingestion
The uncomfortable follow-up. A memory system that scored 98.3% on the benchmark, then rotted in production: noise climbed to 57% over four months, then jumped to 77% within a day of the retirement attempt, traced to a fact extractor left in that was marked "for testing." A full accounting of the failure, and a benchmark to catch the next one before it ships.
Noise climbed to 57.5% over four months, then 77.2% within a day of the retirement attempt; the "for testing" extractor produced 34.5% of the retired database.
Anyone can post the paper where the numbers went up. Publishing the one where the system failed, with the exact cause and the receipts, is rarer and worth more. It is how you tell a lab that measures what it ships from a lab that only markets it.