Methodology

How the score is built.

Every intervention gets three 0–10 axes — Evidence, Benefit, Safety — blended into a weighted composite, then mapped to a 14-grade letter scale. Same rubric across every category, so a vitamin sits on the same scale as a prescription drug.

The formula

Composite = 0.45·Ev + 0.15·Bn + 0.40·Sf
(each axis 0–10 → composite 0–10)

Evidence carries the heaviest weight because the brand promise is evidence-first. Safety is next because a large effect with bad tolerability or serious downside should be pulled down visibly. Benefit still matters, but it cannot dominate the grade on its own. There are no hidden caps or vetoes: a compound can show high Benefit and low Safety at the same time, and the page-level breakdown is the source of truth for that tradeoff. Cost and legal access aren't scored at all: they vary too much by country, insurer, and supplier to belong in a universal grade. Each compound page still notes its legal status as context — it just doesn't move the letter grade.

Evidence (45%)

How much is actually known about this in humans?

Benefit (15%)

If the claim is correct, how big is the effect?

Safety (40%)

Inverted from risk: higher score = safer.

What about cost and legality?

Neither feeds the score. How hard a compound is to obtain — prescription vs. over-the-counter, cash price, schedule status — depends on where you live, your insurer, and your supplier, so it can't sit inside a universal grade without distorting it. Each compound page records its legal status as a note for context; it just doesn't change the letter grade.

From composite to letter grade

The 0–10 composite gets mapped to a 14-grade letter scale. Most grades cover a 0.6-point band; the top four (A− through A+) are tightened to 0.3 points because the best-evidenced interventions cluster near the top of the scale. S covers everything from 8.5 up; F catches everything below 2.2.

S ≥ 8.5 Top of class. The most robustly evidenced interventions in the database — real effect, low risk.
A+ 8.2 – 8.5 Exceptional. Multiple converging high-quality trials with sizeable effect.
A 7.9 – 8.2 Strong overall. Well-evidenced, meaningful effect, manageable risk.
A- 7.6 – 7.9 Strong with one weaker axis (modest effect size or a thinner evidence base).
B+ 7.0 – 7.6 Solid case. Most axes good; one notable trade-off.
B 6.4 – 7.0 Solid. Reasonable evidence, modest effect, some friction.
B- 5.8 – 6.4 Solid lower edge. Defensible but dependent on context.
C+ 5.2 – 5.8 Limited. Mixed evidence, small effect, or noticeable risk.
C 4.6 – 5.2 Limited. One or two real weaknesses.
C- 4.0 – 4.6 Limited lower edge. Hard to justify outside narrow use cases.
D+ 3.4 – 4.0 Weak. Thin data or unfavourable risk/benefit.
D 2.8 – 3.4 Weak. Multiple axes underperform.
D- 2.2 – 2.8 Weak lower edge. Borderline avoid.
F < 2.2 Avoid or defer. Evidence absent, harms outweigh, or both.

Toxins use a separate scale

Toxins (heavy metals, pollutants, lifestyle exposures) aren't on a benefit ladder — there's no upside to maximize. We score them on avoidance priority instead: harm magnitude, evidence of that harm, and exposure prevalence. Higher priority means worth more attention to reduce exposure.

Priority = 0.40·Magnitude + 0.30·Evidence + 0.30·Prevalence

CRITICAL ≥ 8.5 Population-level harm, strong evidence. Reduce exposure now.
HIGH ≥ 6.5 Significant harm with broad exposure. Worth active mitigation.
MODERATE ≥ 4.0 Real harm in many settings. Worth attention.
LOW ≥ 2.0 Limited or context-dependent harm.
MINIMAL ≥ 0.0 Weak or speculative harm signal.

What the score isn't

It's a running synthesis, not a prescription. Three big caveats:

Who does the scoring

I do. I'm not a clinician. I read the literature obsessively and call scores based on what I read. Every score links to the sources that produced it, so you can disagree with me and see exactly where. I update scores when readers point out studies I missed.

Full context: who I am · disclaimer.