How the wiki works

Most tools return results and forget. viCRO builds a knowledge graph that gets richer every time you use it.

The cycle

viCRO follows the Karpathy four-phase knowledge base cycle. Each phase feeds the next. The loop closes when lint findings feed back into compile.

Ingest Raw docs in

→

Compile Build entities

→

Query Score candidates

→

Lint Find gaps

→

Ingest Fill gaps

Ingest

Turns raw documents (PMC XML, ClinicalTrials.gov JSON) into faithful markdown. No classification. No filtering. The raw store is immutable -- once ingested, a document never changes.

Compile

Reads raw docs and builds structured entity articles. One markdown file per entity. Cross-linked via [[slug]] references. Incremental -- re-running compile on the same input produces the same output.

Query

Reads the wiki, scores candidates on three axes (Scale, Cost, Quality), produces a recommendation. Wiki-first -- ingest only triggers when the wiki has gaps for the question asked.

Lint

Scans for gaps, contradictions, staleness, broken links. Findings are not user-facing reports. They are the orchestrator's queue -- each finding feeds back into compile.

Why knowledge compounds

When you query "AD plasma metabolomics," the system reads papers, extracts entities, and writes them to the wiki. Next time someone asks about Alzheimer's -- different question, different angle -- those entities are already there. The wiki grew. The second query is faster, richer, and cheaper.

After 13 papers compiled, the wiki has 328 entities. After 50, it would have ~800. The marginal cost of each answer drops. The marginal value rises.

This is the Karpathy thesis: a knowledge base that is continuously compiled, queried, and self-corrected outperforms any static database or one-shot retrieval system.

What lives on disk

store/
  raw/                  # Immutable. PMC XML, trial JSON, uploads.
    papers/PMC.../      #   source.xml, paper.md, meta.json
    trials/NCT.../      #   source.json, trial.md, meta.json
    uploads/            #   institution_slug/...

  wiki/                 # The living graph. LLM-produced markdown + YAML frontmatter.
    cohorts/            #   one file per cohort
    institutions/       #   one file per institution
    investigators/      #   one file per PI
    platforms/          #   one file per assay platform
    protocols/          #   one file per collection protocol
    bundles/            #   one file per procurement bundle
    index/              #   auto-generated by wiki_index.py

  queries/              # Audit trail. One directory per query run.
    2026-04-08_slug/    #   request.json, candidates.json, recommendation.md

  runs/                 # Compile telemetry. Token counts, wall times, entity curves.

  lint/                 # Scan results. Gaps, consistency, staleness, connections.

raw/ is immutable. wiki/ is the living graph. queries/ are the audit trail -- every query is reproducible from its artifacts.

The graph is the moat

Brokers have private supplier networks. viCRO builds the transparent equivalent from public evidence and verified institutional input. The accumulated provenance graph -- entities, cross-links, evidence chains -- is what makes each subsequent query more valuable than the last.