Entity types
The wiki is a typed graph. Seven entity types, each a markdown file with YAML frontmatter.
cohort Cohort / data_opportunity
A group of subjects with biological samples or data. The primary unit of sourcing intelligence. One file per cohort in store/wiki/cohorts/.
institution Institution
A university, hospital, biobank, or commercial vendor that hosts or manages cohorts. Links to the cohorts it holds and the investigators who work there.
investigator Investigator
A PI or co-investigator linked to one or more cohorts. The slug uses their home institution, not the cohort or consortium they appear in.
platform Platform
An assay, instrument, or analytical method used to generate data. Links to cohorts that used it and carries validation history.
protocol Protocol
A documented collection or processing protocol tied to a specific cohort. The evidence source for pre-analytical quality scoring.
bundle Bundle
A procurement chain of 2-7 typed links (specimen source to data delivery), each with an evidence tier. Created by the bounty workflow when a buyer needs a complete sourcing plan.
What an entity looks like
Every entity is a markdown file with YAML frontmatter followed by prose sections. Here is a real cohort from the wiki -- the ADNI blood DNA methylation substudy.
Frontmatter
A dimension section
After the frontmatter, each entity has dimension sections -- one per evidenced dimension. Each section ends with the implication and contains at least one verbatim source quote.
The card contract
The card: block is the API between the wiki and any consumer -- the web app, the CLI, an external agent. Three fields:
- primary_signal -- the single standout fact. 200 characters max.
- action -- what to do next. A verb phrase. 120 characters max.
- risk -- the single biggest unknown or caveat. 200 characters max.
The web app never parses prose. It reads frontmatter only. If the card is wrong, the user sees wrong data. The card is the contract.
Cross-links
Entities link to each other via [[slug]] references in the body and referenced_by arrays in the frontmatter. The compile/merge phase maintains back-references automatically.
The graph emerges from these links. Investigators connect to institutions. Institutions connect to cohorts. Cohorts connect to platforms. A query like "who else has EPIC methylation data on AD?" traverses these links -- it does not re-read papers.