Evidence standard

Every fact in the wiki carries three things: a verbatim source quote, a source ID, and an implication. Missing any one of the three -- drop the fact.

The rule

  1. Verbatim source quote. A blockquote from the original paper or trial record. No paraphrase. No "approximately as stated." The exact text the agent read.
  2. Source ID. PMC12345, PMID:12345, NCT12345678, DOI:10.xxxx/..., or a full URL. The ID must be retrievable.
  3. Implication. A sentence that says what this means for the buyer's project. If you cannot finish that sentence, the fact is noise.

Example

Bad

The cohort has longitudinal data. // This is a fact. It has no source quote, no ID, and no implication. // It tells the buyer nothing they can act on. Drop it.

Good

ADNI Phase 1 collected plasma metabolomics at baseline and 12-month follow-up across 819 subjects (352 MCI, 181 AD, 286 CN). > "Longitudinal plasma samples were collected at baseline and > 12-month follow-up for 819 ADNI Phase 1 participants" > [ref: PMC10103184] Which means for the buyer's project: the 12-month paired design lets a buyer model within-subject trajectories, but the single follow-up limits detection of non-linear progression patterns that require quarterly or semi-annual sampling.

The quote is verbatim. The source is traceable. The implication names a consequence the buyer cares about -- and names the limitation.

What this prevents

Authority drift

An early extraction makes a soft claim. A downstream summary repeats it. By the third pass, the soft claim is presented as fact. The rule blocks this because each pass must re-cite the original quote. No quote, no claim.

Implication bloat

Every fact must carry an implication. The compiler cannot accumulate facts that have no consequence. Noise is filtered at the source, not the sink. A hundred facts with implications is more useful than a thousand facts without.

Source-hostile reuse

If a paper is retracted or corrected, every claim derived from it is discoverable via provenance.sources. Grep for the PMC ID and you find every entity that depends on it. No claim is orphaned from its source.

Honest labels

Not everything is verified. The system uses four labels to mark the status of each claim.

verified

Directly quoted from a primary source. The strongest label.

inferred

Derived from the source but not directly quoted. The derivation is stated.

open_question

The information is missing. The gap is flagged for follow-up. Lint queues it.

blocked

The information is needed but cannot be obtained from public sources. Requires institutional contact.

Never smooth over missing checks. A gap labeled open_question is more valuable than a gap disguised as a fact.