Compile skills
Three skills turn raw papers into wiki entities. Extract, resolve, merge.
Extract
Reads one paper (store/raw/papers/PMC.../paper.md). Picks 5-8 intelligence dimensions from the canonical 21. Emits structured JSON fragments with verbatim source quotes, source IDs, and implications.
- One Sonnet subagent per paper. Never multiple papers in one context.
- Parallel fan-out. 10 papers run as 10 concurrent subagents.
- ~55K tokens per paper. ~3 minutes wall time per paper at 10x parallelism.
- Outputs land in the run directory as fragment JSON files.
The extract skill is the bottleneck phase. It reads the full paper, identifies entities (cohorts, institutions, investigators, platforms), and produces fragments that the resolve skill will match against the wiki.
Source: .claude/skills/compile/extract/SKILL.md
Resolve
Reads fragments from extract plus the wiki index. Decides for each proposed entity: NEW, MERGE_INTO, or AMBIGUOUS.
- Single Sonnet subagent per batch (typically 10 papers).
- Reads
store/wiki/index/master.mdand slug lists to find matches. - Outputs a resolution plan: which fragments create new entities, which merge into existing ones.
AMBIGUOUScases require human judgment. The orchestrator surfaces them.- ~140K tokens per 10-paper batch. ~10 minutes.
Source: .claude/skills/compile/resolve/SKILL.md
Merge
Reads the resolution plan. Writes new entity articles and updates existing ones. Hook-gated -- every write passes through pre-write-entity.sh for schema validation.
- Single Sonnet subagent per batch.
- Idempotent. Re-running merge on the same plan produces byte-identical output.
- Applies back-references: when a new cohort entity links to an institution, the institution's
referenced_bylist is updated. - ~85K tokens per 10-paper batch. ~14 minutes.
Source: .claude/skills/compile/merge/SKILL.md
Pipeline flow
paper.md --> extract --> fragments.json
|
v
wiki/index/ + fragments --> resolve --> resolution_plan.json
|
v
resolution_plan --> merge --> store/wiki/{type}/{slug}.md
(hook-validated)