Research on the record.

A one-person lab building and stress-testing AI-agent memory and orchestration in production. Published in the open, with permanent DOIs. Receipts, not claims - including the failures most people never write up.

April 2026 · Systems · CC BY 4.0

PrimeAgentOrchestrator: Memory-Primed Agent Spawning for Personal AI Infrastructure

Coding agents start every session with an empty head. This one spawns fresh agents already loaded with the memory of everything that came before, pulled live from personal databases and fused across two backends at spawn time.

Documented over four months of real deployment and three redesigns of the delivery mechanism.


Read it · DOI 10.5281/zenodo.20735389

May 2026 · Systems · Open source · CC BY 4.0

Reminisce: A Cognitive Science-Inspired Memory Architecture for AI Agents

A memory system for agents modeled on how human memory actually works: a working buffer, an episodic timeline, and a semantic layer that notices when new facts contradict old ones. Released open source and benchmarked on LongMemEvalS.

49.4% to 55.8% accuracy on LongMemEvalS across Claude tiers, with ~81% precision on attempted answers (98.3% peak on direct single-session recall). Open source, MIT, 606 tests.


Read it · DOI 10.5281/zenodo.20088749

April 2026 · Postmortem · CC BY 4.0

The Placeholder That Became Production: A Postmortem of Extractive Memory Under Continuous Multi-Agent Ingestion

The uncomfortable follow-up. A memory system that scored 98.3% on the benchmark, then rotted in production: noise climbed to 57% over four months, then jumped to 77% within a day of the retirement attempt, traced to a fact extractor left in that was marked "for testing." A full accounting of the failure, and a benchmark to catch the next one before it ships.

Noise climbed to 57.5% over four months, then 77.2% within a day of the retirement attempt; the "for testing" extractor produced 34.5% of the retired database.


Read it · DOI 10.5281/zenodo.20735397

Why the postmortem is here on purpose

Anyone can post the paper where the numbers went up. Publishing the one where the system failed, with the exact cause and the receipts, is rarer and worth more. It is how you tell a lab that measures what it ships from a lab that only markets it.