Source

Find cohorts, institutions, or sourcing paths matching a scientific question.

Usage

CLI

vcro source "AD plasma metabolomics, longitudinal, n>=200"

Slash command

/source AD plasma metabolomics, longitudinal, n>=200

How it works

Seven gates. The orchestrator parses the request inline, confirms with the user once, then runs research, scoring, and delivery autonomously.

Parse + Confirm Gate 0-1
Research Gate 2-3
Score Gate 4
Deliver Gate 5
Compile Gate 6 (background)

Parse + Confirm (Gates 0-1) — the orchestrator parses the request inline into request.json and presents a summary with sharpening questions. This is the only mandatory stop. Once confirmed, everything runs autonomously.

Research (Gates 2-3) — parallel multi-source research: wiki scan, PubMed search, EuropePMC search, ClinicalTrials.gov, and institutional pages. Search results become search_lead candidates that can be scored immediately without compilation. Results merge into a single candidates list.

Score (Gate 4) evaluates each candidate on three independent axes: Scale, Cost, Quality. No composite score. The buyer chooses their own weighting.

Deliver (Gate 5) assembles the recommendation from scored candidates. Writes recommendation.md and listings.jsonl for the web app.

Compile (Gate 6) runs in the background after delivery. Papers found during research are compiled into the wiki for future queries. There is no "compile first, then score" path — search results are scored immediately.

Wiki-first, search-augmented

The source workflow reads the wiki first, then augments with external search. Search results are immediately usable as search_lead candidates — they get scored alongside wiki candidates without waiting for compilation.

The orchestrator does not ask permission to search or ingest. The user's request is the consent. It logs every search query and decision to disk.

Intent gate

If understand classifies the intent as commission (the buyer wants specimens, not existing data), the workflow switches to Bounty automatically. The verb does not matter -- "find me 50 AD CSF samples" is a commission request regardless of the word "find."

What gets written to disk

store/queries/2026-04-08_ad-plasma-metab/
  request.json              # Structured brief from understand
  search_history.jsonl      # Every external search query + hit count
  candidates.json           # Entities that passed discover
  scored_candidates.json    # Three-axis scores per candidate
  recommendation.md         # The deliverable
  listings.jsonl            # Card projections for the web app
  ingest_shortlist.md       # PMC/NCT list if ingest was triggered

Every query is reproducible from its artifacts. The search history captures the exact queries run, so you can audit why a candidate was or was not found.

Example

$ vcro source "NSCLC FFPE RNA-seq cohorts, no neoadjuvant, n>=300, commercial use"

Source workflow. Wiki had 3 matches, 1 with commercial-use unknown.
Recommendation at store/queries/2026-04-08_nsclc-ffpe-rnaseq/recommendation.md.
All 3 candidates scored. TCGA-LUAD leads on Scale (n=522) but
block age >5yr is a pre-analytical risk. Commercial-use clause
on the Broad cohort is unverified -- flagged in Quality axis.