Entity types

The wiki is a typed graph. Seven entity types, each a markdown file with YAML frontmatter.

cohort Cohort / data_opportunity

A group of subjects with biological samples or data. The primary unit of sourcing intelligence. One file per cohort in store/wiki/cohorts/.

institution Institution

A university, hospital, biobank, or commercial vendor that hosts or manages cohorts. Links to the cohorts it holds and the investigators who work there.

investigator Investigator

A PI or co-investigator linked to one or more cohorts. The slug uses their home institution, not the cohort or consortium they appear in.

platform Platform

An assay, instrument, or analytical method used to generate data. Links to cohorts that used it and carries validation history.

protocol Protocol

A documented collection or processing protocol tied to a specific cohort. The evidence source for pre-analytical quality scoring.

bundle Bundle

A procurement chain of 2-7 typed links (specimen source to data delivery), each with an evidence tier. Created by the bounty workflow when a buyer needs a complete sourcing plan.

What an entity looks like

Every entity is a markdown file with YAML frontmatter followed by prose sections. Here is a real cohort from the wiki -- the ADNI blood DNA methylation substudy.

Frontmatter

--- entity_id: adni-blood-dnam-csf-biomarker type: data_opportunity canonical_name: "ADNI Whole-Blood EPIC Methylation + CSF Biomarker Matched Substudy (n=202)" opportunity_type: published_cohort evidence_type: direct disease_area: ["Alzheimer's disease", "cognitively normal"] modality: ["whole-blood EPIC methylation", "CSF biomarkers"] provenance: sources: [PMC9980279, PMC10088180, PMC11838696] last_compiled: "2026-04-08" scoring: scale: {confidence: high} cost: {confidence: medium} quality: {provenance_depth: 0.48, confidence: medium} card: primary_signal: "202 subjects with same-visit whole-blood EPIC methylation + CSF biomarkers" action: "Apply for ADNI data access via adni.loni.usc.edu" risk: "n=202 is underpowered for subgroup EWAS; commercial-use terms need verification" ---

A dimension section

After the frontmatter, each entity has dimension sections -- one per evidenced dimension. Each section ends with the implication and contains at least one verbatim source quote.

## real_numbers 202 subjects total (123 cognitively normal, 79 AD cases, all age >65); a separate ADNI subset of 263 subjects has matched EPIC methylation and Affymetrix U219 gene expression. > "Our study included samples from a total of 202 subjects > (123 cognitively normal, 79 AD cases) ... matched whole blood > DNA methylation, CSF biomarkers data measured on the same > subjects and at the same clinical visits in the ADNI study." > [ref: PMC10088180] Which means for the buyer's project: the usable N for a joint methylation + CSF biomarker study is 202, with 79 AD cases; this is below typical EWAS power thresholds, so any replication study must budget for an independent cohort of at least equivalent size.

The card contract

The card: block is the API between the wiki and any consumer -- the web app, the CLI, an external agent. Three fields:

The web app never parses prose. It reads frontmatter only. If the card is wrong, the user sees wrong data. The card is the contract.

Cross-links

Entities link to each other via [[slug]] references in the body and referenced_by arrays in the frontmatter. The compile/merge phase maintains back-references automatically.

The graph emerges from these links. Investigators connect to institutions. Institutions connect to cohorts. Cohorts connect to platforms. A query like "who else has EPIC methylation data on AD?" traverses these links -- it does not re-read papers.