Compile
Extract structured entities from papers and merge them into the wiki. Runs via the vcro-compile agent (Opus orchestrator, Sonnet workers).
Usage
CLI
vcro compile PMC10103184
Slash command
/compile PMC10103184 PMC9876543 PMC8765432
Accepts one or more PMC IDs, or a path to a shortlist file.
How it works
Three phases. Extract fans out in parallel. Resolve and merge run sequentially.
Extract
One Sonnet subagent per paper. Each reads exactly one paper.md from store/raw/papers/, picks 5-8 dimensions from the canonical 21, and writes a fragments.json. Every fragment carries a verbatim source quote, a source ID, and an implication. Fragments without all three are dropped.
Papers run in parallel, waves of up to 10. One paper per subagent context -- this is a hard rule. Mixing papers in one context degrades extraction quality.
Resolve
One subagent reads all fragments plus the wiki index. For each entity hint, it decides: NEW (create a new entity), MERGE_INTO (merge into an existing entity), or AMBIGUOUS (stop and ask). The output is a resolution plan.
If any entity is AMBIGUOUS, the workflow stops and asks the user before proceeding to merge.
Merge
One subagent executes the resolution plan. It writes new entity files and updates existing ones. Every write goes through the pre-write-entity hook, which validates YAML frontmatter against the entity schema. Bad frontmatter blocks the write.
After merge, back-references are applied: if paper A mentions institution B, institution B's referenced_by list gets updated.
Idempotency
Same compile against the same plan produces byte-identical wiki output. Re-running compile on the same papers is safe. The extract cache (store/runs/_cache/) skips already-extracted papers.
What gets written to disk
store/runs/2026-04-08_ad-plasma-compile/
papers.txt # Input paper list
ledger.md # Per-phase status and decisions
run.jsonl # Telemetry (tokens, wall time)
fragments/ # One JSON per paper
PMC10103184.json
PMC9876543.json
store/wiki/
cohorts/adni-blood-dnam-csf-biomarker.md # New or updated
institutions/usc-loni-data-coordinating-center.md
investigators/michael-weiner-ucsf.md
platforms/illumina-epic-methylation-beadchip.md
Parallel fan-out for batch compiles
For a 50-paper compile: 5 waves of 10 parallel extracts, then 1 resolve, then 1 merge. Projected cost: ~20M tokens, ~15h wall time. The extract cache means a resumed run skips already-completed papers.
Example
$ vcro compile PMC10103184 PMC9876543
Compile workflow. 2 papers, both already ingested.
Extract: 2/2 succeeded, 14 entity hints total.
Resolve: 8 NEW, 6 MERGE_INTO, 0 AMBIGUOUS.
Merge: 14 entities written, 0 hook rejections.
Top new cohorts: adni-blood-dnam-csf-biomarker,
delcode-blood-epic-methylation.
Run artifacts at store/runs/2026-04-08_ad-compile/.