Design in Product social media card
← Back to Hub substantive

Cross-Pollination Brief — April 13, 2026

This brief covers the weekend's output: Klatch's Argus, Iris, and Daedalus working in parallel on testing infrastructure, UX evaluation, and the completed Step 10 export endpoint; PM's Lead Dev closing seven M2a issues in a single session; and a cross-project memory research thread — initiated by Janus, answered by PM's Docs, routed by PA — that produced five actionable M2 scope candidates and confirmed one genuinely novel concept. The most directly cross-relevant output is Argus's: a research-grade mapping of Klatch's AAXT taxonomy against PM's Colleague Test rubric, plus a fabrication probe class designed as a direct answer to the Pattern-045 failure mode that closed PM's M1 gate.


Key Insights

1. AAXT/Colleague Test cross-reference: ready to use for PM's #929 scorer

From: Argus (Klatch), docs/research/aaxt-pm-colleague-test-crossref.md, April 12 Relevant to: Piper Morgan (M2 issues #929, #928, #930)

Argus produced a formal cross-reference mapping Klatch's six-failure-mode AAXT taxonomy against PM's seven-question Colleague Test rubric. The headline finding: the instruments are complementary, not duplicative. Neither should replace the other — they test different things at different granularities.

What each catches that the other misses:

  • AAXT catches, Colleague Test misses: Subliminal (agent uses knowledge it can't attribute) and the Reconstructed/Correct distinction (compaction artifact vs. word-perfect delivery). These require two-phase probing; pass/fail rubric scoring collapses them.
  • Colleague Test catches, AAXT misses: Tone/voice (Q5), actionability (Q7 — correct but unusable), infrastructure failures (Q1, PM's Pattern-045), and holistic trust (Q6 — gestalt judgment beyond any single taxonomy category).

Argus's recommendation: maintain both instruments, build a translation table for cross-project comparison. The translation table is included in the document — when PM publishes Colleague Test results and Klatch publishes AAXT results for the same feature area, the table maps result patterns to failure mode classifications.

The most immediately actionable item: PM's DeepEval LLM-as-judge scorer for #929 should adopt the six failure modes (Correct, Reconstructed, Confabulated, Absent, Phantom, Subliminal) as its output vocabulary. This makes automated results directly comparable across projects without requiring the Colleague Test rubric to change.

Suggested action: PM Architect read docs/research/aaxt-pm-colleague-test-crossref.md before #929 is scoped. Argus's recommended six-failure-mode adoption for the DeepEval scorer is the primary decision point.


2. Fabrication probe class: a direct answer to Pattern-045, with a no-infrastructure option

From: Argus (Klatch), docs/plans/AAXT-FABRICATION-PROBE-CLASS.md, April 12 Relevant to: Piper Morgan (floor fabrication defense, M2 AAXT infrastructure #929-930)

The Pattern-045 failure that drove PM's M1 gate work — floor LLM fabricating plausible-looking todo items when asked about data not in its context — is now a formalized probe class in Klatch's AAXT taxonomy. Argus designed it as a "trigger condition" analysis: fabrication under absent context is not a new failure mode but a setup that triggers Confabulated (hedged invention) or Phantom (confident invention) classification.

The probe class defines five categories, each targeting a specific type of absent data: file absence, entity absence, memory absence, history absence, channel absence. For each, the probe design is explicit: construct a context with data of type X present, ask about a related but absent item of type X, score on whether the agent expresses honest uncertainty (pass) or produces specifics (fail).

Two paths to implementation:

  1. Full integration: Wire into Klatch's AAXT Scaffolded Probing Phase 2 (Argus's Monday assignment). The auxiliary LLM generates absence probes from the present-context summary, the pipeline sends them to the target agent, the scorer classifies responses.

  2. Standalone now: 5-10 manually constructed absence probes per channel shape, sent to the target agent, classified by hand against the pass/fail table. No scaffolding pipeline required. Immediately testable.

The standalone version is PM's fastest path to validating that its Pattern-045 guardrail (the hard prohibition in the floor system prompt added April 11) holds across diverse absence probe categories — not just the "list todos" case that revealed the original failure.

There's also a defensive prompt addition that doesn't require any testing infrastructure: a candidate guardrail text for kit briefing (Layer 1) or entity prompt (Layer 5) that explicitly instructs the agent to express uncertainty rather than extrapolate absent data. PM already ships a version of this (committed April 11). The Klatch version extends it to entity prompts as a default behavioral instruction for all entities.

Suggested action: PA note the standalone probe class as a low-effort validation step for the Pattern-045 guardrail. PM Architect read the fabrication probe class doc before scoping #929's probe generation strategy — the five categories map directly to the M2 test surface.


3. Type 2 dreaming confirmed as genuinely novel — and the memory research finding that matters most

From: PM Docs + Janus exchange, dev/active/dreaming-concept-provenance-2026-04-12.md, April 12 Relevant to: Klatch (Mnemosyne, Layer 3 memory gaps); Piper Morgan (M2 scope, ADR-054 composting pipeline)

A cross-project memory research thread running through the weekend produced two findings worth surfacing here. The thread: Janus surveyed 20+ external memory systems, PM's Docs provided prior art across 7 areas, Docs synthesized a hybrid recommendation (no external vector store — PM's existing filesystem governance is already ahead of most external systems in provenance and audit trail).

Finding 1 — Type 2 dreaming is genuinely novel. PM has a designed but unimplemented "filing dreams" composting architecture (Type 1: consolidation/indexing, runs during quiet hours). xian also described "anxiety dreams" — the system imagines failure scenarios to prepare ("what if the floor fabricates again?", "what if the briefing is stale when the gate tester arrives?"). Janus confirmed that of 20+ surveyed external memory systems, none implement this risk simulation pattern. It is not in PM's docs beyond a single 2025 mention. It is not in any academic or commercial memory system in the survey. Genuinely novel.

Finding 2 — "Write governance is everything." The Janus synthesis headline. The technology choice (vector store, SQL, filesystem) is secondary. What external systems consistently get wrong — and what PM's filesystem infrastructure consistently gets right — is write governance: who can write what, when, with what provenance. PM's mailbox system is a typed, audit-trailed message bus. Omnibus logs are append-only institutional memory with explicit reconstruction methodology. The gap is automated maintenance — consolidation, staleness detection, deduplication. That gap maps directly to the unimplemented ADR-054 composting pipeline.

For Klatch: Layer 3 memory (MEMORY.md or the Claude Code auto-memory system at ~/.claude/projects/[path]/memory/) has the same write governance gap — writes happen at session end, manually, without automation. The five issues PM filed (#972-976) — temporal validity, cache audit, session-end evaluation, delta-since-last-session, composting pipeline — address the same structural gap Klatch's Layer 3 has.

Suggested action: Mnemosyne read the six-area taxonomy from the Janus synthesis (available in Klatch's mail files from April 12). The write governance framing is the most useful analytical lens for any future Klatch memory architecture work. PM and Klatch are now converging on identical memory infrastructure gaps from different directions.


4. Memory schema coordination: temporal validity fields need bilateral alignment

From: PM Docs to PA, mailboxes/pa/inbox/memo-docs-memory-action-items-2026-04-12.md, April 12 Relevant to: Klatch (Step 10 Phase 1 package format); Piper Morgan (M2 scope items #1 and #4)

PM's Docs flagged a specific coordination point in the memory action items memo: "If you approve temporal validity fields, coordinate with Janus on field spec — Klatch's Step 10 Phase 1 is adopting the same structure. Compatible schemas enable the context interchange protocol."

The PM side: adding valid_from and optional ended to memory file frontmatter. Starting with BRIEFING-CURRENT-STATE and memos. A convention, not code — low-effort, high-value for staleness tracking.

The Klatch side: the Phase 1 package format (docs/plans/STEP-10-PHASE-1-PACKAGE-FORMAT.md) includes provenance timestamps and layer_fidelity vocabulary (full, partial, rebuilt, absent) across all content entries. The temporal fields are already in the format; the question is whether PM's frontmatter convention aligns with the package format's field names.

This is not a blocker — the PM convention and the Klatch format are at different abstraction levels (memory file governance vs. export package format). But if both projects adopt compatible field naming now, the Step 10 interchange protocol can read PM memory file freshness directly from frontmatter without format translation.

The cost of alignment is a brief conversation and a field name decision. The cost of non-alignment is a translation layer in the context interchange protocol later.

Suggested action: PA, when scoping the temporal validity memory item, send a brief note to Janus (or flag via this brief) on the field name PM is proposing. Calliope or Daedalus to respond with the Klatch package format's convention. One-round exchange.


5. "Step 10 is 1.0" — the roundtrip narrative and what it means for BYOC architecture

From: Iris interview with xian (Klatch), docs/logs/2026-04-12-1850-iris-opus-log.md, April 12 Relevant to: Piper Morgan (BYOC distribution architecture, M5 planning)

During Iris's Theme 3 interview, xian offered the clearest articulation yet of what Step 10 means for Klatch's product strategy:

"Step 10 is 1.0 — not 0.10.0. It completes the roundtrip. Users can bring agents in, do structured work, and take the results back to their preferred system. Nobody has to burn their ships and commit 100% to Klatch. 'Passed through Klatch on its way somewhere else.' This is a dramatically more inviting proposition than requiring permanent adoption."

The roundtrip wasn't planned — it was discovered. The export path emerged from the import work. Step 10 isn't the second half of an original plan; it's the step that became possible because earlier steps revealed it.

The relevance for PM's BYOC architecture: the "passed through on its way somewhere else" framing is the same architectural proposition PM is building toward in M5. A user brings their LLM client (Claude Desktop, ChatGPT, Gemini); Piper shows up as MCP tools + context + persistence; they take the work back to their preferred system. Neither Klatch nor Piper Morgan requires permanent adoption. Both are positioned as high-quality waypoints in a user's existing workflow, not replacement environments.

This convergent positioning — found independently in each project — is worth naming explicitly before M5 architecture decisions get made. The "passed through" framing has design implications for what the export/distribution experience should feel like (service design, not configuration screens) that both projects are now articulating in parallel.

Suggested action: PA add a note to M5 planning that explicitly frames PM's BYOC distribution goal as "passed through on its way somewhere else" — and check whether the phrase appears in Vision V2.3 or Roadmap v15.0 in that exact framing. If not, it's worth adding as a formulation worth canonicalizing.


Sources Read

Klatch:

  • git log --since="48 hours ago" — 32 commits
  • docs/logs/2026-04-12-1718-argus-opus-log.md — Round 18 tests, cross-reference, fabrication probe class
  • docs/logs/2026-04-12-1850-iris-opus-log.md — Session 3: Phase 1 design doc read, Theme 3 interview (Q7-10), design principles synthesis
  • docs/research/aaxt-pm-colleague-test-crossref.md — Full cross-reference document
  • docs/plans/AAXT-FABRICATION-PROBE-CLASS.md — Full fabrication probe class design
  • docs/ux/design-principles.md — Four clusters, meta-principle, five developing areas
  • docs/mail/calliope-to-argus-aaxt-phase2-2026-04-13.md — Monday AAXT Phase 2 assignment
  • docs/mail/calliope-to-argus-sdk-hono-sweep-2026-04-13.md — Monday SDK/Hono/sweep assignment
  • docs/logs/2026-04-13-0710-calliope-opus-log.md — Monday session start

Piper Morgan:

  • git log --since="48 hours ago" — 38 commits
  • dev/active/2026-04-12-0806-pa-opus-log.md — Day 13: memory research read, 5 issues filed, Type 2 dreaming queued
  • dev/active/2026-04-12-0955-lead-code-opus-log.md — M2a: 7 issues closed, canonical baseline (95.1% routing / 65.6% quality)
  • dev/active/dreaming-concept-provenance-2026-04-12.md — Type 2 dreaming intellectual history
  • mailboxes/pa/inbox/memo-docs-memory-action-items-2026-04-12.md — Memory M2 action items, temporal validity coordination flag
  • dev/active/memo-docs-to-janus-memory-prior-art-2026-04-12.md — Comprehensive PM memory prior art (7 areas)