Methodology

How the score is built.

Every intervention gets three 0–10 axes — Evidence, Benefit, Safety — blended into a weighted composite, then mapped to a 14-grade letter scale. Same rubric across every category, so a vitamin sits on the same scale as a prescription drug.

The formula

Composite = 0.45·Ev + 0.15·Bn + 0.40·Sf
(each axis 0–10 → composite 0–10)

Evidence carries the heaviest weight because the brand promise is evidence-first. Safety is next because a large effect with bad tolerability or serious downside should be pulled down visibly. Benefit still matters, but it cannot dominate the grade on its own. There are no hidden caps or vetoes: a compound can show high Benefit and low Safety at the same time, and the page-level breakdown is the source of truth for that tradeoff. Cost and legal access aren't scored at all: they vary too much by country, insurer, and supplier to belong in a universal grade. Each compound page still notes its legal status as context — it just doesn't move the letter grade.

Evidence (45%)

How much is actually known about this in humans?

9–10 — Very strong: Multiple high-quality RCTs, robust meta-analyses, or decades of human outcome data.
7–8 — Strong: At least one large, well-run RCT plus convergent secondary evidence.
5–6 — Moderate: One solid RCT, or several consistent observational studies.
3–4 — Weak / mixed: Small trials, conflicting results, or thin data.
0–2 — Preclinical or none: In-vitro, animal-only, anecdotal, or no usable data.

Benefit (15%)

If the claim is correct, how big is the effect?

9–10 — Very large: Mortality or disease-incidence shifts, or biomarker changes that translate clinically.
7–8 — Large: Reproducible effect sizes that show up in real-world outcomes.
5–6 — Moderate: Real but modest. Often only on surrogate biomarkers.
3–4 — Small: Detectable in trials, hard to feel in life.
0–2 — Negligible: Within noise or null on hard endpoints.

Safety (40%)

Inverted from risk: higher score = safer.

9–10 — Very low risk: No meaningful safety signal at reasonable doses.
7–8 — Low: Generally well-tolerated. Minor side-effects in a minority.
5–6 — Moderate: Real side-effect profile, drug interactions, or monitoring requirements.
3–4 — High: Serious adverse events at therapeutic doses. Clinician-supervised territory.
0–2 — Severe: Fatal-potential or unacceptable harm profile.

What about cost and legality?

Neither feeds the score. How hard a compound is to obtain — prescription vs. over-the-counter, cash price, schedule status — depends on where you live, your insurer, and your supplier, so it can't sit inside a universal grade without distorting it. Each compound page records its legal status as a note for context; it just doesn't change the letter grade.

From composite to letter grade

The 0–10 composite gets mapped to a 14-grade letter scale. Most grades cover a 0.6-point band; the top four (A− through A+) are tightened to 0.3 points because the best-evidenced interventions cluster near the top of the scale. S covers everything from 8.5 up; F catches everything below 2.2.

S ≥ 8.5 Top of class. The most robustly evidenced interventions in the database — real effect, low risk.

A+ 8.2 – 8.5 Exceptional. Multiple converging high-quality trials with sizeable effect.

A 7.9 – 8.2 Strong overall. Well-evidenced, meaningful effect, manageable risk.

A- 7.6 – 7.9 Strong with one weaker axis (modest effect size or a thinner evidence base).

B+ 7.0 – 7.6 Solid case. Most axes good; one notable trade-off.

B 6.4 – 7.0 Solid. Reasonable evidence, modest effect, some friction.

B- 5.8 – 6.4 Solid lower edge. Defensible but dependent on context.

C+ 5.2 – 5.8 Limited. Mixed evidence, small effect, or noticeable risk.

C 4.6 – 5.2 Limited. One or two real weaknesses.

C- 4.0 – 4.6 Limited lower edge. Hard to justify outside narrow use cases.

D+ 3.4 – 4.0 Weak. Thin data or unfavourable risk/benefit.

D 2.8 – 3.4 Weak. Multiple axes underperform.

D- 2.2 – 2.8 Weak lower edge. Borderline avoid.

F < 2.2 Avoid or defer. Evidence absent, harms outweigh, or both.

Toxins use a separate scale

Toxins (heavy metals, pollutants, lifestyle exposures) aren't on a benefit ladder — there's no upside to maximize. We score them on avoidance priority instead: harm magnitude, evidence of that harm, and exposure prevalence. Higher priority means worth more attention to reduce exposure.

Priority = 0.40·Magnitude + 0.30·Evidence + 0.30·Prevalence

CRITICAL ≥ 8.5 Population-level harm, strong evidence. Reduce exposure now.

HIGH ≥ 6.5 Significant harm with broad exposure. Worth active mitigation.

MODERATE ≥ 4.0 Real harm in many settings. Worth attention.

LOW ≥ 2.0 Limited or context-dependent harm.

MINIMAL ≥ 0.0 Weak or speculative harm signal.

What the score isn't

It's a running synthesis, not a prescription. Three big caveats:

Personalisation dominates. A compound that's top-tier for population-level outcomes can be a terrible fit for you specifically. Your labs, your genetics, your other medications — all of that trumps a ranking.
Dose and context matter more than the grade. Most compounds here can be abused into harm or under-dosed into uselessness. The dosing column is a first pass, not a protocol.
Evidence moves. Grades update as new trials land and as we re-read the old ones. The archive tracks every movement.

Who does the scoring

I do. I'm not a clinician. I read the literature obsessively and call scores based on what I read. Every score links to the sources that produced it, so you can disagree with me and see exactly where. I update scores when readers point out studies I missed.

Full context: who I am · disclaimer.