Provenance and transparency

The system does not take sides between brokers, hospitals, and direct sources. It scores provenance quality regardless of channel.

Provenance over bypass

A broker listing with full disclosed provenance scores higher than a direct source with undocumented protocol. A hospital biobank with detailed collection-protocol depth scores higher than a commercial vendor with shallow chain documentation, even if the vendor is faster and cheaper.

The score axis is provenance depth, not channel type. Channels are surfaced as opportunity_type -- published_cohort, hospital_inventory_signal, surplus_trial_samples, broker_listed_inventory, biobank_self_reported, bounty_bundle -- but they do not weight the score.

Gaps are first-class data

When the wiki cannot trace a sample back to its origin, the entity carries a low provenance_depth and the missing dimensions are listed in the "Open questions" section. The buyer sees the gap explicitly.

The system never invents provenance. It never fills a gap with a plausible guess. An entity with depth 0.3 and honest gaps is more useful than an entity with depth 0.8 and fabricated dimensions. Gaps are data. Lint queues them for follow-up.

Negative results count

A cohort that documents a failed analysis is more valuable than a cohort that is silent. Negative results -- a biomarker panel that showed no signal, an assay that failed on degraded specimens, a confounding variable that washed out an effect -- carry the same evidence standard as positive results. Verbatim quote. Source ID. Implication.

This prevents buyers from re-discovering dead ends. If the wiki records that plasma NfL showed no separation between AD and MCI in a particular cohort, the next buyer asking about NfL sees that finding before investing in the same experiment.

Both sides

The same wiki serves demand-side buyers and supply-side institutions. A buyer query reads the wiki and scores candidates. A supply-side onboarding reads the same wiki and produces a catalog listing draft for the institution to review.

There is no separate "buyer view" and "supplier view" of the data. One wiki. Institutions use cards.onboarding_view to see completeness, gaps, and demand signals. Buyers use cards.buyer_view (or the default card:) to see signal, action, and risk.

The 21 dimensions

The extract skill checks each paper against 21 intelligence dimensions. Provenance depth is the fraction covered. Here are the dimensions:

Real numbers
Sample usability
Longitudinal structure
Confounders and exposures
Demographic composition
Co-modalities and multi-omics value
Effect sizes and model performance
Replication and validation
Access and consent scope
Multi-site recruitment
Negative results
Sponsor and funding
Published analysis code
Collection protocol detail
Biospecimen retention and types
Comparative positioning
Regulatory and ethical framework
Data sharing infrastructure
Contact and collaboration
Specimen fitness for assay
Cost and timeline signals

An entity with 10 of 21 dimensions covered has a depth of 0.48. An entity with 15 has 0.71. The median across the current wiki is around 0.45. Lint flags entities below 0.3 for priority re-compilation.