How the wiki works
Most tools return results and forget. viCRO builds a knowledge graph that gets richer every time you use it.
The cycle
viCRO follows the Karpathy four-phase knowledge base cycle. Each phase feeds the next. The loop closes when lint findings feed back into compile.
Ingest
Turns raw documents (PMC XML, ClinicalTrials.gov JSON) into faithful markdown. No classification. No filtering. The raw store is immutable -- once ingested, a document never changes.
Compile
Reads raw docs and builds structured entity articles. One markdown file per entity. Cross-linked via [[slug]] references. Incremental -- re-running compile on the same input produces the same output.
Query
Reads the wiki, scores candidates on three axes (Scale, Cost, Quality), produces a recommendation. Wiki-first -- ingest only triggers when the wiki has gaps for the question asked.
Lint
Scans for gaps, contradictions, staleness, broken links. Findings are not user-facing reports. They are the orchestrator's queue -- each finding feeds back into compile.
Why knowledge compounds
When you query "AD plasma metabolomics," the system reads papers, extracts entities, and writes them to the wiki. Next time someone asks about Alzheimer's -- different question, different angle -- those entities are already there. The wiki grew. The second query is faster, richer, and cheaper.
After 13 papers compiled, the wiki has 328 entities. After 50, it would have ~800. The marginal cost of each answer drops. The marginal value rises.
This is the Karpathy thesis: a knowledge base that is continuously compiled, queried, and self-corrected outperforms any static database or one-shot retrieval system.
What lives on disk
store/
raw/ # Immutable. PMC XML, trial JSON, uploads.
papers/PMC.../ # source.xml, paper.md, meta.json
trials/NCT.../ # source.json, trial.md, meta.json
uploads/ # institution_slug/...
wiki/ # The living graph. LLM-produced markdown + YAML frontmatter.
cohorts/ # one file per cohort
institutions/ # one file per institution
investigators/ # one file per PI
platforms/ # one file per assay platform
protocols/ # one file per collection protocol
bundles/ # one file per procurement bundle
index/ # auto-generated by wiki_index.py
queries/ # Audit trail. One directory per query run.
2026-04-08_slug/ # request.json, candidates.json, recommendation.md
runs/ # Compile telemetry. Token counts, wall times, entity curves.
lint/ # Scan results. Gaps, consistency, staleness, connections.
raw/ is immutable. wiki/ is the living graph. queries/ are the audit trail -- every query is reproducible from its artifacts.
The graph is the moat
Brokers have private supplier networks. viCRO builds the transparent equivalent from public evidence and verified institutional input. The accumulated provenance graph -- entities, cross-links, evidence chains -- is what makes each subsequent query more valuable than the last.