Compile

Extract structured entities from papers and merge them into the wiki. Runs via the vcro-compile agent (Opus orchestrator, Sonnet workers).

Usage

CLI

vcro compile PMC10103184

Slash command

/compile PMC10103184 PMC9876543 PMC8765432

Accepts one or more PMC IDs, or a path to a shortlist file.

How it works

Three phases. Extract fans out in parallel. Resolve and merge run sequentially.

Extract One paper each

→

Resolve NEW / MERGE / AMBIGUOUS

→

Merge Write to wiki

Extract

One Sonnet subagent per paper. Each reads exactly one paper.md from store/raw/papers/, picks 5-8 dimensions from the canonical 21, and writes a fragments.json. Every fragment carries a verbatim source quote, a source ID, and an implication. Fragments without all three are dropped.

Papers run in parallel, waves of up to 10. One paper per subagent context -- this is a hard rule. Mixing papers in one context degrades extraction quality.

Resolve

One subagent reads all fragments plus the wiki index. For each entity hint, it decides: NEW (create a new entity), MERGE_INTO (merge into an existing entity), or AMBIGUOUS (stop and ask). The output is a resolution plan.

If any entity is AMBIGUOUS, the workflow stops and asks the user before proceeding to merge.

Merge

One subagent executes the resolution plan. It writes new entity files and updates existing ones. Every write goes through the pre-write-entity hook, which validates YAML frontmatter against the entity schema. Bad frontmatter blocks the write.

After merge, back-references are applied: if paper A mentions institution B, institution B's referenced_by list gets updated.

Idempotency

Same compile against the same plan produces byte-identical wiki output. Re-running compile on the same papers is safe. The extract cache (store/runs/_cache/) skips already-extracted papers.

What gets written to disk

store/runs/2026-04-08_ad-plasma-compile/
  papers.txt                  # Input paper list
  ledger.md                   # Per-phase status and decisions
  run.jsonl                   # Telemetry (tokens, wall time)
  fragments/                  # One JSON per paper
    PMC10103184.json
    PMC9876543.json

store/wiki/
  cohorts/adni-blood-dnam-csf-biomarker.md    # New or updated
  institutions/usc-loni-data-coordinating-center.md
  investigators/michael-weiner-ucsf.md
  platforms/illumina-epic-methylation-beadchip.md

Parallel fan-out for batch compiles

For a 50-paper compile: 5 waves of 10 parallel extracts, then 1 resolve, then 1 merge. Projected cost: ~20M tokens, ~15h wall time. The extract cache means a resumed run skips already-completed papers.

Example

$ vcro compile PMC10103184 PMC9876543

Compile workflow. 2 papers, both already ingested.
Extract: 2/2 succeeded, 14 entity hints total.
Resolve: 8 NEW, 6 MERGE_INTO, 0 AMBIGUOUS.
Merge: 14 entities written, 0 hook rejections.
Top new cohorts: adni-blood-dnam-csf-biomarker,
delcode-blood-epic-methylation.
Run artifacts at store/runs/2026-04-08_ad-compile/.