The Drug Works. So What?

Stateazy Series

The Drug Works. So What?

The Problem First

You're a cardiology resident. Grand rounds. The presenter shows the headline result of a mega-trial:

"Novel antihypertensive Agent Z reduced systolic BP by 1.8 mmHg compared to placebo (95% CI: 0.9 to 2.7, p < 0.001). The reduction was highly statistically significant."

The audience murmurs approvingly. p < 0.001. Three zeros. That's impressive.

Then the visiting professor from the back row asks one question:

"1.8 millimetres of mercury. Would you change your prescription pad for that?"

Silence.

1.8 mmHg. Your home BP monitor can't even detect that reliably. The difference between sitting and standing shifts your BP more than that. The caffeine in the tea the patient drank before the clinic visit shifts it more than that. The entire effect of this "highly significant" drug is smaller than the noise in your measurement tool.

But p < 0.001. Three zeros. The drug "works."

This is the most dangerous sentence in medicine: "The result was statistically significant." Not because it's wrong — it's technically correct. But because it answers a question nobody is actually asking. The statistician asks: "Is the effect real?" The patient asks: "Will I feel better?" The doctor asks: "Should I prescribe this?" Statistical significance answers the first question. It is silent on the other two.

And the gap between "real" and "worth it" is where patients get harmed.

Before the Terms — What Are We Actually Asking?

When you test a drug, you're really asking three separate questions. The disaster happens when you answer only one and pretend you've answered all three.

Question 1: "Is the effect REAL?" → Is the observed difference genuine, or could it be random noise? → This is what the p-value and statistical significance answer.

Question 2: "Is the effect BIG ENOUGH TO MATTER?" → Even if it's real, does it change anything for the patient? → This is clinical significance.

Question 3: "Is the effect WORTH THE COST?" → Even if it's real and big enough, do the benefits outweigh the harms, the expense, the inconvenience? → This is practical significance / clinical utility.

Three questions. Three different answers. One word — "significant" — that English speakers use for all three. And therein lies the catastrophe.

Word Surgery: "Significant"

"Significant"

Root: Latin significare = signum (sign, mark) + facere (to make, to do) → "to make a sign" / "to point at something" / "to indicate"

Literal meaning: "making a sign that something is there"

In everyday English: "important, meaningful, substantial, consequential"

"A significant pay raise" = a raise big enough to matter
"A significant relationship" = a relationship that means something
"No significant damage" = damage too small to worry about

In statistics: "unlikely to be due to chance alone (given α)"

A "significant" result might be a 0.1 mmHg BP reduction
A "significant" result might be a 0.01% mortality difference
A "significant" result might be a finding nobody would act on

→ The collision: When a paper says "statistically significant reduction in mortality," the doctor's brain processes "significant" in its EVERYDAY sense — important, substantial, worth acting on. The paper means it in its STATISTICAL sense — unlikely to be zero. These are not the same thing. They are not even close.

→ Aha: "Significant" in statistics = "we detected a signal." "Significant" in English = "the signal matters." Detection ≠ importance. A metal detector at the airport beeps for a belt buckle. The beep is real (statistically significant). The belt buckle is not a weapon (not clinically significant).

"Clinical"

Root: Greek klinikos = "of or at the bedside" → From kline = "bed" / "couch" (where the sick person lies)

Literal meaning: "at the bedside" → "relevant to patient care"

→ So "clinical significance" literally = "significance at the bedside" — does this result matter when you're standing next to the patient?

"Statistical"

Root: German Statistik (1749, coined by Gottfried Achenwall) → from Latin status = "state, condition" → Originally: "the science of data about the state (government)"

Literal meaning: "relating to the collection and analysis of numerical data"

→ "Statistical significance" literally = "significance according to the numbers" — the data says something is there.

The Naming Collision

What the doctor means by "significant"	What the statistician means by "significant"
Important	Detectable
Meaningful	Non-zero
Worth acting on	Unlikely by chance
Changes my practice	Passes the α threshold
Helps the patient	The test statistic exceeds the critical value

Same word. Different planets. And nobody put a warning label on the word when statistics borrowed it from English. Fisher used "significant" in 1925 because it was a normal English word meaning "noteworthy." He didn't anticipate that 100 years later, the entire medical profession would confuse "noteworthy to a statistician" with "important to a patient."

Naming Family — The Four Types of Significance

Type	Question	Who Decides	Example
Statistical significance	Is the effect non-zero?	The p-value (compared to α)	p = 0.001 for 0.3 mmHg BP drop
Clinical significance	Is the effect big enough to help patients?	Clinicians, guidelines, MCIDs	Is 0.3 mmHg enough to reduce stroke?
Practical significance	Is it worth the money, effort, and risk?	Health economists, payers, patients	Does the benefit justify ₹5,000/month and side effects?
Personal significance	Does it matter to THIS patient?	The patient	"I'd rather not take another pill for 0.3 mmHg"

The 2×2 Matrix — The Four Possible Outcomes

This is the framework you need tattooed on the inside of your eyelids.

	Clinically significant (effect matters)	Clinically NOT significant (effect trivial)
Statistically significant (p < 0.05)	QUADRANT 1: The Ideal — Real effect that matters. Act on it.	QUADRANT 2: The Trap — Real but trivial effect. Do NOT act on it.
Statistically NOT significant (p ≥ 0.05)	QUADRANT 3: The Tragedy — Meaningful effect missed. Study was underpowered.	QUADRANT 4: The Null — No real effect, none missed. Move on.

Quadrant 1: Statistically significant AND clinically significant

The dream. A large, well-powered study finds a real effect that matters to patients.

Example: SPRINT trial (2015). Intensive BP control (target < 120 mmHg) reduced major cardiovascular events by 25% (HR = 0.75, 95% CI: 0.64-0.89, p < 0.001). The effect is real (p < 0.001) AND the magnitude matters (25% reduction in heart attacks and strokes). Practice changed worldwide.

Quadrant 2: Statistically significant but NOT clinically significant

The trap. The most dangerous quadrant. The one that wastes billions in healthcare spending and exposes patients to side effects for nothing.

Example: The 1.8 mmHg BP reduction from the opening scenario. p < 0.001. Three zeros. But the effect is smaller than measurement noise. No guideline recommends starting a drug for 1.8 mmHg. No patient would benefit meaningfully. The drug "works" in the statistical sense and is useless in the clinical sense.

How this happens: Large sample sizes. With n = 20,000, you have so much statistical power that you can detect an effect of 0.5 mmHg as "significant." The p-value gets smaller as the sample gets bigger — even if the effect stays tiny. Statistical significance is a function of BOTH effect size AND sample size. Clinical significance is a function of effect size ALONE.

The formula:

p-value ≈ f(effect size × √n)

Double the sample → the test statistic grows → the p-value shrinks → the same trivial effect becomes "more significant." The effect hasn't changed. Your ability to detect it has. The microscope got stronger. The bacterium didn't get bigger.

Quadrant 3: NOT statistically significant but clinically significant

The tragedy. A real, important effect that the study missed because it was too small.

Example: A paediatric trial tests a promising leukaemia drug. Response rate: drug 78%, standard 55%. Difference = 23 percentage points. p = 0.09. n = 42.

"Not significant." The drug "doesn't work."

The truth: A 23 percentage point improvement in leukaemia response is enormous. It's potentially life-saving. But with only 42 children (ethical and practical constraints), the study didn't have enough power to push p below 0.05. The drug probably works. The study couldn't prove it.

The consequence: The drug isn't approved for children. Or development is abandoned. Or the company doesn't invest in a larger trial. Children die from a treatable cancer because a meaningful effect didn't cross an arbitrary statistical threshold.

How to detect you're in Quadrant 3: Look at the confidence interval. If the CI includes clinically meaningful effects (e.g., 95% CI for response difference: -3% to +49%), the study is saying: "We can't rule out a huge benefit OR a small harm." That's not evidence of no effect — it's evidence of insufficient data. The width of the CI tells you the study was too small.

Quadrant 4: NOT statistically significant, NOT clinically significant

The true null. A well-powered study that convincingly shows no meaningful effect.

Example: A large trial (n = 5,000) tests homeopathy for chronic pain. Pain reduction: 0.2 points on a 10-point VAS. 95% CI: -0.3 to 0.7. p = 0.42.

The effect is near zero, the CI is entirely within the range of clinical irrelevance, and the p-value is large. This is a well-conducted study showing that the treatment genuinely doesn't work. Not underpowered. Not ambiguous. Just null.

The MCID — The Number Nobody Teaches You

Word Surgery: MCID

MCID = Minimal Clinically Important Difference

Minimal = the smallest
Clinically = at the bedside
Important = that matters to the patient
Difference = between treatment and control

→ "The smallest difference between groups that a patient would actually notice or care about."

Why this matters: The MCID is the THRESHOLD for clinical significance, the way α = 0.05 is the threshold for statistical significance. If the observed effect exceeds the MCID → clinically significant. If it doesn't → clinically insignificant, regardless of the p-value.

How MCIDs Are Determined

Method	How It Works	Example
Anchor-based	Compare the score change to a patient-reported global assessment ("Are you better/same/worse?")	Patients who say "a little better" improved by ~3 points on the HDRS → MCID ≈ 3
Distribution-based	Use 0.5 × SD of baseline scores as the MCID	If baseline SD = 8, MCID ≈ 4
Delphi consensus	Experts agree on what constitutes a meaningful change	Panel of rheumatologists agree: 20% improvement in ACR criteria = meaningful
Patient-derived	Ask patients: "What improvement would make the treatment worthwhile?"	Patients say: "I'd need at least 2 fewer migraine days per month"

MCIDs for Common Outcomes

Outcome Measure	Commonly Used MCID	Specialty
Systolic BP	≥ 5 mmHg (some say ≥ 10 mmHg)	Cardiology
HbA1c	≥ 0.5%	Endocrinology
Hamilton Depression Scale (HDRS)	≥ 3 points	Psychiatry
Visual Analogue Scale (VAS) pain	≥ 1.3 points (10-point scale)	Anaesthesia, Ortho
FEV1	≥ 100 mL (some say ≥ 200 mL)	Pulmonology
PASI-75	75% improvement (built into the outcome)	Dermatology
WOMAC	≥ 10 points (or ≥ 15%)	Orthopaedics
EDSS	≥ 1.0 point (or ≥ 0.5 at higher levels)	Neurology
6-Minute Walk Test	≥ 30 metres	Cardiology, Pulm
EQ-5D utility	≥ 0.05	Health economics

The critical point: When a paper reports "statistically significant improvement" on any of these scales, your FIRST question should be: "Did the improvement exceed the MCID?" If not, the drug produced a detectable but imperceptible change. The patient won't feel better. The numbers changed. The patient didn't.

Real Examples — The Full Spectrum

Example 1: Statistically significant, clinically meaningless

ACCORD Blood Pressure Trial (2010)

Intensive BP control (< 120 mmHg) vs standard (< 140 mmHg) in diabetics
Primary composite outcome: HR = 0.88, 95% CI: 0.73-1.06, p = 0.20 (primary endpoint MISSED)
BUT: Stroke reduction: HR = 0.59, 95% CI: 0.39-0.89, p = 0.01

Wait — stroke reduction WAS significant. So this is Quadrant 1?

Not so fast. The absolute stroke reduction was 0.21% per year (1.06% vs 0.85%). NNT = 476 per year. You need to treat 476 diabetics aggressively for an entire year to prevent one stroke. Meanwhile, the intensive arm had significantly MORE serious adverse events (hypotension, syncope, hyperkalaemia, renal failure).

The p-value: 0.01. "Significant." The clinical reality: Treat 476 patients. Prevent 1 stroke. Cause multiple serious adverse events in the other 475.

The statistical significance was real. The clinical trade-off was unfavourable.

Example 2: Statistically insignificant, clinically revolutionary

Streptomycin for Tuberculosis — The First RCT (1948)

The MRC streptomycin trial is celebrated as the first randomised controlled trial. Results:

Streptomycin group: 7% mortality at 6 months
Control group: 27% mortality at 6 months
Difference: 20 percentage points

By any modern standard, this is a massive, practice-changing effect. But the trial had only 107 patients. The p-value, while significant in this case (p < 0.01), would have easily been non-significant with a slightly smaller sample or slightly less dramatic effect.

The teaching point: If the effect had been 15% vs 22% (still clinically meaningful) in a trial of 50 patients, p might have been 0.15. "Not significant." And a life-saving antibiotic might have been dismissed.

The early history of medicine is full of treatments that "worked" clinically but couldn't cross p < 0.05 in small trials. We were lucky that streptomycin's effect was so large that even a tiny trial caught it.

Example 3: The antidepressant crisis

Kirsch et al. (2008) — The Emperor's New Drugs

Meta-analysis of all FDA-submitted data for 6 major antidepressants:

Mean drug-placebo difference: 1.8 points on the Hamilton Depression Rating Scale
MCID for HDRS: ≥ 3 points (NICE criterion)
Statistical significance: p < 0.001 across nearly all trials

The interpretation: Antidepressants produce a REAL effect (statistically significant). The effect is SMALLER than what patients can perceive (below MCID). The drugs "work" in that they shift a number on a scale. They don't "work" in that most patients can't tell the difference between the drug and the placebo.

The counter-argument: The AVERAGE effect is below MCID. But averages hide variation. SOME patients respond dramatically (the true responders). Others don't respond at all (pulling the average down). The average being below MCID doesn't mean NO patient benefits — it means the average patient doesn't benefit enough to notice.

The unresolved question: Is it ethical to prescribe a drug with side effects (weight gain, sexual dysfunction, withdrawal) to 100 patients when only 20-30 will experience a meaningful benefit and you can't predict who they are? Statistical significance says yes. Clinical significance says "it's complicated." Patient autonomy says "tell them the truth and let them decide."

Example 4: Large effect, missed by small trial

Hypothermia for Neonatal Encephalopathy — Early Trials

Before the landmark CoolCap and TOBY trials, several small studies tested therapeutic hypothermia for birth asphyxia:

Small pilot studies showed death/disability rates of ~50% (hypothermia) vs ~65% (control)
15 percentage point absolute reduction
But with n = 30-50, p-values ranged from 0.08 to 0.20

"Not significant." Development nearly stalled.

Then the larger trials (n = 200+) confirmed the effect: NNT = 6-8. Treat 6-8 babies with cooling to prevent one death or case of severe disability. One of the most effective interventions in neonatal medicine.

The cost of requiring statistical significance in small trials: Years of delay. Babies who could have been cooled but weren't. The clinical significance was always there. The statistical significance needed a bigger sample to appear.

Why This Confusion Persists — The 5 Structural Causes

Cause 1: One word, two meanings

"Significant" means "important" in English and "detectable" in statistics. As long as we use the same word for both concepts, the conflation is inevitable. The ASA's 2019 proposal to retire the phrase "statistically significant" was an attempt to break this linguistic trap. It failed because the word is too entrenched.

Cause 2: Journals filter on p-values, not effect sizes

Publication bias: journals preferentially publish studies with p < 0.05. Effect size doesn't determine publishability — p-value does. This creates a literature enriched for "significant" findings regardless of whether the effects are clinically meaningful. The incentive structure rewards statistical significance and is indifferent to clinical significance.

Cause 3: p-values are easy. MCIDs are hard.

Calculating a p-value is mechanical. Determining the MCID for an outcome requires clinical judgment, patient input, and anchor studies. Many outcomes don't HAVE established MCIDs. When the MCID is unknown, researchers default to the only threshold available: p < 0.05. The easy metric displaces the important metric.

Cause 4: Sample sizes keep growing

Modern mega-trials enrol 10,000-50,000 patients. At these sample sizes, effects of 0.1% are statistically significant. The bigger the trial, the smaller the detectable effect, the larger the gap between statistical and clinical significance. Ironically, our most powerful studies are the most susceptible to producing "significant" but meaningless results.

Cause 5: Nobody fails a viva for ignoring clinical significance

Examiners ask: "Is p = 0.03 significant?" Expected answer: "Yes, because p < 0.05." Full marks.

Nobody asks: "Is a 0.4 mmHg BP reduction clinically significant?" Because that requires knowing MCIDs, understanding clinical context, and making a judgment. The exam system rewards threshold-thinking and doesn't test clinical-thinking. Students optimise for exams. The confusion perpetuates.

Branch-by-Branch — Where This Kills Patients

General Medicine

The scenario: JUPITER trial (2008). Rosuvastatin in people with normal LDL but elevated CRP.

Relative risk reduction for MI: 54% (HR = 0.46, p < 0.001). Sounds enormous.
Absolute risk reduction: 0.35% per year (0.77% vs 0.42%).
NNT per year: 286.

The marketing version: "54% reduction in heart attacks! Highly significant!"

The clinical reality: Treat 286 healthy people with a statin for a year — with its costs, side effects (myalgia, diabetes risk, liver monitoring) — to prevent one heart attack. The relative risk reduction is statistically significant AND sounds clinically impressive. The absolute risk reduction reveals the clinical significance is marginal in low-risk populations.

The lesson: Relative risk reduction is ALWAYS statistically significant when p < 0.05. But it hides the baseline risk. A 50% relative reduction from 0.002% to 0.001% is "significant" and absurd. Always demand the absolute numbers.

Surgery

The scenario: A meta-analysis of robotic vs laparoscopic prostatectomy. 12 studies. 15,000 patients.

Positive surgical margin rate: robotic 13.5% vs laparoscopic 15.2%
Difference: 1.7 percentage points
p = 0.008

"Statistically significant advantage for robotic surgery."

The clinical question: Is a 1.7% absolute difference in margin rates worth the ₹3-5 lakh additional cost of robotic surgery? Does 1.7% translate to meaningful differences in recurrence or survival? (No long-term data says it does.)

The hospital administrator's version: "Our new robot produces significantly better cancer outcomes." (Based on p = 0.008 for a 1.7% difference that has no proven survival benefit.)

Paediatrics

The scenario: A trial of a new formula milk vs breastfeeding. Cognitive scores at age 5:

Formula group: IQ 98.2
Breastfed group: IQ 100.4
Difference: 2.2 IQ points
p = 0.02

"Statistically significant cognitive advantage of breastfeeding."

The MCID for IQ: Generally considered 5-7 points for a "meaningful" difference. 2.2 points is within the measurement error of most IQ tests. A child cannot be meaningfully distinguished as "smarter" or "less smart" based on 2.2 IQ points.

The harm: A mother who couldn't breastfeed (medical reasons, work constraints, supply issues) reads the headline and carries guilt for years. The "significant" p-value weaponised a trivial difference into a parenting judgment.

Obstetrics

The scenario: A trial of a new tocolytic for preterm labour.

Prolongation of pregnancy: drug 52.3 hours, placebo 48.1 hours
Difference: 4.2 hours
p = 0.04

"Statistically significant prolongation of pregnancy."

The clinical reality: 4.2 hours. The drug kept the baby inside for 4 extra hours. Does 4 hours change neonatal outcomes? Does it allow time for a full course of antenatal steroids (which requires 48 hours)? If the mother is already at 50 hours, 4 more doesn't help. If she's at 2 hours, 4 more doesn't get her to 48.

The MCID for tocolysis: Generally considered 48 hours (enough for a full steroid course) or 7 days (enough for transfer and preparation). 4.2 hours doesn't meet either threshold.

The drug: Gets approved based on p = 0.04. Gets prescribed. Has side effects (tachycardia, pulmonary oedema with some tocolytics). Buys 4 hours. The statistical significance justified a drug. The clinical significance didn't.

Psychiatry

The scenario: New anxiolytic. Hamilton Anxiety Scale (HAM-A) improvement:

Drug: 12.4 points improvement
Placebo: 10.1 points improvement
Difference: 2.3 points
p = 0.003
MCID for HAM-A: ≈ 4 points

The pattern: This is the antidepressant story all over again. The drug produces a real, detectable, replicable effect that is BELOW the threshold patients can perceive. The p-value is beautiful. The patient can't tell whether they got the drug or the sugar pill.

The systemic problem in psychiatry: Psychiatric outcomes are subjective scales (HDRS, HAM-A, PANSS, MADRS). The placebo response in psychiatry is enormous (30-50% improvement is common with placebo). The drug-placebo DIFFERENCE is therefore small even when the drug has a real pharmacological effect. Combined with large sample sizes, this produces a literature full of "significant" findings with sub-MCID effects.

The result: Patients take medications with real side effects for statistically significant but imperceptible benefits. The p-value approved the drug. The patient's lived experience says "I don't feel different."

Community Medicine / PSM

The scenario: A district-level nutrition intervention. Childhood stunting prevalence:

Intervention district: 31.2%
Control district: 33.8%
Difference: 2.6 percentage points
p = 0.04

"Statistically significant reduction in stunting."

The MCID for a population-level intervention: Debatable, but a 2.6 percentage point reduction in a district of 500,000 children = ~13,000 fewer stunted children. At population scale, even small percentage differences represent large absolute numbers.

Wait — this is the OPPOSITE lesson. At population level, effects that are clinically insignificant for an INDIVIDUAL (2.6% doesn't help any single child much) can be clinically significant for a POPULATION (13,000 fewer stunted children is massive).

The nuance: Clinical significance depends on the LEVEL OF ANALYSIS. For the individual patient: "Does this help ME?" For the population: "Does this shift the curve?" A 2.6% stunting reduction is meaningless for an individual child. It's transformative for a district. Statistical significance and clinical significance both need a referent — significant FOR WHOM?

Orthopaedics

The scenario: Total knee replacement. A new implant design vs standard.

Knee Society Score at 2 years: new 88.4 vs standard 85.7
Difference: 2.7 points
p = 0.01
MCID for Knee Society Score: ≈ 5-8 points

"Statistically significant superiority of the new implant."

The reality: 2.7 points on a scale where patients need ≥ 5-8 points to NOTICE a difference. The new implant costs ₹80,000 more. It has no long-term survivorship data. It requires surgeon retraining.

The marketing: "Clinically proven superiority, p = 0.01." The word "clinically" is doing fraudulent work here. The proof is statistical. The clinical significance is absent.

How to Read a Paper — The 5-Step Clinical Significance Checklist

Every time you read a result reported as "statistically significant," run through this:

Step 1: What's the effect size?

Not the p-value. The EFFECT SIZE. Mean difference, risk ratio, hazard ratio, absolute risk reduction. If the paper buries the effect size and leads with the p-value, be suspicious. Papers that lead with p-values are often hiding small effects behind impressive-sounding probabilities.

Step 2: What's the MCID for this outcome?

Does the effect exceed the threshold patients would notice? If no established MCID exists, use clinical judgment: would YOU notice this difference? Would your patient care about it?

Step 3: What's the confidence interval?

The CI tells you the range of plausible effect sizes. If the ENTIRE CI is below the MCID, the effect is statistically significant but almost certainly clinically insignificant. If the CI SPANS the MCID (lower bound below, upper bound above), you're uncertain — the true effect might or might not be clinically meaningful.

Step 4: What's the absolute risk reduction / NNT?

For binary outcomes (death, MI, stroke), convert relative measures to absolute. RR = 0.70 sounds impressive. ARR = 0.3% and NNT = 333 puts it in perspective.

Step 5: What's the harm?

Every treatment has side effects. A statistically significant but clinically marginal benefit must be weighed against DEFINITE harms (drug side effects, surgical complications, cost, inconvenience). The NNT must be compared to the NNH (Number Needed to Harm). If NNT = 200 and NNH = 50, you're harming 4 patients for every 1 you help.

The 6 Ways Not Knowing This Destroys You

1. You prescribe drugs that don't help patients

You see p < 0.001 and start prescribing. The drug has real side effects. The benefit is 1 mmHg. Your patient gets the side effects and doesn't get meaningful benefit. You've traded certainty of harm for statistical significance of benefit.

2. You dismiss treatments that could save lives

A small trial shows a 20% mortality reduction with p = 0.08. You dismiss it: "Not significant." Patients die while waiting for a larger trial that may never be funded because the first trial "failed."

3. You can't evaluate guidelines critically

Guidelines say: "Recommend Drug X (Level of Evidence A, Grade 1)." You don't check whether the evidence is statistically significant for a clinically significant effect, or statistically significant for a clinically meaningless effect. You follow the guideline blindly. Guidelines built on Quadrant 2 evidence (significant but trivial) are recommending drugs that don't help.

4. You can't communicate with patients honestly

Patient: "Does this drug work?"

Bad answer: "Yes, the study showed p < 0.001." (Patient hears: it works really well.)

Good answer: "The drug lowers your BP by about 2 points. Your BP varies by more than that between morning and evening. Whether that 2-point drop prevents anything long-term is uncertain. The side effects are X. What would you like to do?"

5. You fall for pharmaceutical framing

Drug companies ALWAYS report relative risk reductions (sounds big) and p-values (sounds scientific). They rarely volunteer absolute risk reductions (sounds small) or NNTs (sounds unimpressive). Without understanding the gap between statistical and clinical significance, you're defenceless against selective reporting.

6. You can't do a proper journal club

The purpose of journal club is to decide: should we change practice? That question requires evaluating clinical significance, not just statistical significance. A journal club that only asks "was it significant?" and stops there has failed its educational purpose.

The One Thing to Remember

Statistical significance tells you the SIGNAL is real. Clinical significance tells you the signal MATTERS.

A smoke detector that goes off when someone lights a candle is statistically significant — it correctly detected smoke. But you don't evacuate the building for a candle. The detection is real. The response would be insane.

The p-value is your smoke detector. It beeps when it detects something. Your job as a clinician is to look at WHAT it detected — a house fire or a birthday candle — before you pull the alarm.

"Statistically significant" is a statement about the DATA. "Clinically significant" is a statement about the PATIENT. The data is not the patient. The patient is not the data. The doctor who confuses the two will prescribe drugs that work on paper and fail at the bedside.

Fisher designed the p-value to make scientists think harder. We used it to think less. The result was a medical literature full of "significant" findings that don't help patients, and a generation of doctors trained to worship a threshold instead of exercising judgment.

The resident who sees p < 0.001 and asks "What's the effect size? Does it exceed the MCID? What's the NNT? What's the NNH?" — that resident practises medicine.

The resident who sees p < 0.001 and says "Highly significant" — that resident practises arithmetic.

One of them will help patients. The other will cite p-values. They are not the same thing.