Provenance and transparency

The system does not take sides between brokers, hospitals, and direct sources. It scores provenance quality regardless of channel.

Provenance over bypass

A broker listing with full disclosed provenance scores higher than a direct source with undocumented protocol. A hospital biobank with detailed collection-protocol depth scores higher than a commercial vendor with shallow chain documentation, even if the vendor is faster and cheaper.

The score axis is provenance depth, not channel type. Channels are surfaced as opportunity_type -- published_cohort, hospital_inventory_signal, surplus_trial_samples, broker_listed_inventory, biobank_self_reported, bounty_bundle -- but they do not weight the score.

Gaps are first-class data

When the wiki cannot trace a sample back to its origin, the entity carries a low provenance_depth and the missing dimensions are listed in the "Open questions" section. The buyer sees the gap explicitly.

The system never invents provenance. It never fills a gap with a plausible guess. An entity with depth 0.3 and honest gaps is more useful than an entity with depth 0.8 and fabricated dimensions. Gaps are data. Lint queues them for follow-up.

Negative results count

A cohort that documents a failed analysis is more valuable than a cohort that is silent. Negative results -- a biomarker panel that showed no signal, an assay that failed on degraded specimens, a confounding variable that washed out an effect -- carry the same evidence standard as positive results. Verbatim quote. Source ID. Implication.

This prevents buyers from re-discovering dead ends. If the wiki records that plasma NfL showed no separation between AD and MCI in a particular cohort, the next buyer asking about NfL sees that finding before investing in the same experiment.

Both sides

The same wiki serves demand-side buyers and supply-side institutions. A buyer query reads the wiki and scores candidates. A supply-side onboarding reads the same wiki and produces a catalog listing draft for the institution to review.

There is no separate "buyer view" and "supplier view" of the data. One wiki. Institutions use cards.onboarding_view to see completeness, gaps, and demand signals. Buyers use cards.buyer_view (or the default card:) to see signal, action, and risk.

The 21 dimensions

The extract skill checks each paper against 21 intelligence dimensions. Provenance depth is the fraction covered. Here are the dimensions:

  1. Real numbers
  2. Sample usability
  3. Longitudinal structure
  4. Confounders and exposures
  5. Demographic composition
  6. Co-modalities and multi-omics value
  7. Effect sizes and model performance
  8. Replication and validation
  9. Access and consent scope
  10. Multi-site recruitment
  11. Negative results
  12. Sponsor and funding
  13. Published analysis code
  14. Collection protocol detail
  15. Biospecimen retention and types
  16. Comparative positioning
  17. Regulatory and ethical framework
  18. Data sharing infrastructure
  19. Contact and collaboration
  20. Specimen fitness for assay
  21. Cost and timeline signals

An entity with 10 of 21 dimensions covered has a depth of 0.48. An entity with 15 has 0.71. The median across the current wiki is around 0.45. Lint flags entities below 0.3 for priority re-compilation.