Why Do We Square the Differences Just to Take the Square Root Again?

Stateazy Series

Why Are There Two Words for "How Spread Out Is My Data" — And Why Is One of Them Squared?

The Problem First

You're a community medicine resident writing your thesis. You've measured haemoglobin levels in 200 pregnant women. Your statistician friend says:

"Report the mean and standard deviation."

You do: Mean = 10.2 g/dL, SD = 1.8 g/dL.

Your examiner asks: "What's the variance?"

You pause. You know SD = 1.8. So... variance = 1.8²= 3.24... g/dL²?

Wait. g/dL squared? What is a "squared gram per decilitre"? You can't measure haemoglobin in squared units. Nobody's blood is 3.24 grams-squared-per-decilitre-squared of anything. The number has no clinical meaning.

So why does variance exist? Why would anyone square the deviations, get an uninterpretable number in squared units, and then take the square root to get back to interpretable units (SD)? Why not just skip the squaring and use the average distance from the mean directly?

The answer involves a 200-year-old mathematical decision by Gauss, a property that makes the entire statistical framework possible, and a naming collision that confuses every student who encounters it.

Word Surgery: "Deviation"

"Deviation"

Root: Latin de- (away from) + via (road, path, way) → deviare = "to go off the road" / "to wander from the path"

Literal meaning: "the amount by which something has gone off the road"

In statistics: How far a single value has wandered from the mean. If your mean haemoglobin is 10.2 and a patient has 8.4, that patient's deviation is 8.4 - 10.2 = -1.8 g/dL. She's wandered 1.8 units below the path.

→ Aha: Every data point is a traveller. The mean is the road. The deviation is how far each traveller has strayed from the road. Some wander left (negative deviation), some wander right (positive). The question is: on average, how far does a typical traveller stray?

Naming Family of "Deviation"

Term	What It Measures	Formula
Deviation	One patient's distance from the mean	xᵢ - x̄
Absolute deviation	Same, but ignoring direction (sign)	\	xᵢ - x̄\
Mean absolute deviation (MAD)	Average of all absolute deviations	Σ\	xᵢ - x̄\	/ n
Standard deviation (SD)	"Standardised" average deviation (via squaring route)	√(Σ(xᵢ - x̄)² / (n-1))
Interquartile range (IQR)	Deviation of the middle 50%	Q3 - Q1

The confusing part: "Deviation" appears in both the raw concept (each point's distance from mean) AND the summary statistic (standard deviation). A student hearing "deviation" doesn't know if you mean the individual distance or the global summary. Context is everything.

Word Surgery: "Variance"

"Variance"

Root: Latin variare = "to change, to alter, to make different" → variantia = "the state of being different" / "changeableness"

Literal meaning: "how much things differ from each other" / "the degree of changeableness"

In statistics: The average of the squared deviations from the mean.

Variance (s²) = Σ(xᵢ - x̄)² / (n-1)

→ So "variance" literally = "the degree to which values VARY" — measured in squared units because the deviations were squared before averaging.

Why "Variance" and Not "Mean Squared Deviation"?

Because Ronald Fisher named it.

In 1918, Fisher introduced the term "variance" in his paper "The Correlation Between Relatives on the Supposition of Mendelian Inheritance." Before Fisher, people used phrases like "mean square deviation" or "mean square error." These were accurate but clunky.

Fisher wanted a single, clean noun. He wrote:

"It is... desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance."

Fisher chose "variance" because:

It's short (one word vs three)
It comes from "vary" — the concept it measures
It has a formal, technical sound appropriate for a mathematical quantity
It distinguished the squared measure (variance) from the unsquared measure (standard deviation)

Fisher capitalised "Variance" in his original paper. He was introducing a NEW term and wanted to signal its importance. Later usage dropped the capitalisation.

→ Aha: "Standard Deviation" was Pearson's term (1893). "Variance" was Fisher's term (1918). Two different statisticians, 25 years apart, naming two different versions of the same underlying concept. SD came first. Variance was reverse-engineered from SD by squaring it — and then given a separate name because Fisher needed it as a standalone concept for ANOVA.

Naming Family of "Variance"

Term	Symbol	What It Is	Units
Variance	s², σ²	Average squared deviation	Original units SQUARED (g²/dL², mmHg², kg²)
Standard Deviation	s, σ	Square root of variance	Original units (g/dL, mmHg, kg)
Coefficient of Variation	CV	SD divided by mean, expressed as %	Unitless (%)
Mean Square	MS	Variance as used inside ANOVA	Same as variance, different context
Sum of Squares	SS	Total of all squared deviations (before dividing by n-1)	Original units squared

Why Square? — The 200-Year-Old Decision

This is the question every student asks. The answer has three layers.

Layer 1: The Cancellation Problem

If you just average the raw deviations (without squaring or taking absolute values), they cancel out to exactly zero. Always. By definition.

Example: Values: 4, 6, 8. Mean = 6.

Value	Deviation (Value - Mean)
4	-2
6	0
8	+2
Sum	0

The negatives and positives cancel perfectly. The mean deviation is always zero. Useless as a measure of spread.

Two solutions existed:

Solution A: Take absolute values (ignore the sign).

MAD = (\|−2\| + \|0\| + \|+2\|) / 3 = (2 + 0 + 2) / 3 = 1.33

This works. It's intuitive. It's in the original units.

Solution B: Square the deviations (negatives become positive because any number squared is positive).

Variance = (4 + 0 + 4) / 2 = 4.0 (using n-1) SD = √4 = 2.0

This also works. But it required an extra step (square root to get back to original units).

So why did Gauss choose squaring (Solution B) over absolute values (Solution A)?

Layer 2: Gauss's Mathematical Reasons (1809)

Carl Friedrich Gauss, working on predicting planetary orbits from noisy measurements, needed to find the "best" estimate of a planet's position from multiple observations. He chose squaring for three specific mathematical reasons:

Reason 1: Differentiability.

Squared functions (x²) are smooth and differentiable everywhere. Absolute value functions (\|x\|) have a sharp corner at zero — they're not differentiable at the point where x = 0.

Why does this matter? Gauss was using calculus to minimise the total error (find the estimate that minimises the spread). With squared deviations, you set the derivative to zero and solve — clean, elegant calculus. With absolute deviations, the derivative doesn't exist at zero, and you need more complex optimisation methods that didn't exist in 1809.

Squaring was chosen because it made the maths solvable with 19th-century tools. If Gauss had had a computer, he might have chosen absolute values.

Reason 2: Connection to the Gaussian (Normal) Distribution.

The bell curve's probability density function contains σ² (the variance) in its exponent:

f(x) = (1/σ√(2π)) × e^(-(x-μ)²/2σ²)

The squaring in the variance mirrors the squaring in the bell curve formula. This isn't coincidence — Gauss derived the bell curve BY assuming that the best measure of spread is the one that minimises the sum of squared deviations. The normal distribution and the variance are mathematically married. Each defines the other.

If he had used absolute deviations, the resulting distribution would be the Laplace distribution (a different bell-shaped curve with sharper peaks and fatter tails). We'd be living in a different statistical universe.

Reason 3: Additivity.

This is the most important reason for modern statistics, and the one Fisher exploited to build ANOVA.

Variances of independent variables ADD.

If X has variance 4 and Y has variance 9, and X and Y are independent, then:

Var(X + Y) = Var(X) + Var(Y) = 4 + 9 = 13

This is called the Bienaymé identity (Irénée-Jules Bienaymé, 1853).

Standard deviations DON'T add this way:

SD(X + Y) ≠ SD(X) + SD(Y) SD(X + Y) = √(4 + 9) = √13 = 3.61 ≠ 2 + 3 = 5

Mean absolute deviations DON'T add this way either.

This additivity is not just a mathematical curiosity. It's the FOUNDATION of:

ANOVA (Analysis of Variance): total variance = between-group variance + within-group variance. This partition ONLY works because variances add.
Regression: variance explained + variance unexplained = total variance.
Combining independent measurements: when you combine data from two labs or two studies, you can add variances to get the combined variability.
Standard error: SEM = SD/√n works because the variance of the mean = σ²/n (variance divided by sample size). If variance weren't additive, this formula wouldn't hold.

Without additivity, we'd have no ANOVA, no regression R², no standard error, no confidence intervals, no meta-analysis. The entire statistical framework collapses. THAT is why we square.

Layer 3: Fisher's Exploitation (1918-1925)

Fisher saw what Gauss had built and realised: if variance is additive, I can decompose the total variability in data into components.

Which component comes from the treatment? Which comes from random error? Which comes from blocking? Which comes from the interaction?

This decomposition is Analysis of Variance (ANOVA) — Fisher's greatest invention (1918-1925). The name says it: Analysis of VARIANCE. Not analysis of standard deviation. Not analysis of spread. VARIANCE. Because only variance is additive and decomposable.

Word Surgery: "Analysis of Variance" (ANOVA)

Analysis: Greek analysis = ana- (up, back, throughout) + lysis (loosening, breaking apart) → "breaking something apart throughout" / "decomposing into components"

Of Variance: "of the spread-measured-in-squared-units"

→ So ANOVA literally = "breaking apart the squared spread into its sources."

→ Aha: You have total variability in your data. ANOVA breaks it apart into: "variability caused by the treatment" and "variability caused by random noise." If the treatment chunk is big enough relative to the noise chunk, the treatment "works."

This decomposition is only possible because variance (squared units) is additive. SD is not. MAD is not. ONLY variance.

That's why we suffer through squared units that no one can interpret clinically. The squared units are the PRICE we pay for the mathematical property (additivity) that makes all of statistical testing possible.

So What's the Actual Relationship?

The Simple Version

Variance = SD² SD = √Variance

Variance is the engine. SD is the dashboard display.

You never see the engine directly (variance in squared units is uninterpretable). You see the dashboard (SD in original units, which you can understand). But the engine does all the work underneath: ANOVA, regression, standard error, confidence intervals — all computed using variance, then converted to SD or SE for reporting.

The Complete Map

Individual deviations (xᵢ - x̄) ↓ Square each one Squared deviations (xᵢ - x̄)² ↓ Sum them all Sum of Squares (SS) = Σ(xᵢ - x̄)² ↓ Divide by (n-1) [Bessel's correction] Variance (s²) = SS / (n-1) ↓ Take square root Standard Deviation (s) = √Variance ↓ Divide by √n Standard Error (SEM) = SD / √n ↓ Multiply by critical valueConfidence Interval = Mean ± z* × SEM

Every row in this chain depends on the row above it. The CI that appears in every paper you read traces its ancestry back to individual deviations through variance. Cut variance out and the chain breaks at step 4.

The Dictionary Confusion — Why These Words Collide

"Deviation" in Everyday Language

Context	"Deviation" means...
Navigation	Going off course ("deviate from the flight path")
Psychology	Abnormal behaviour ("deviant behaviour")
Engineering	Departure from specification ("manufacturing deviation")
Medicine	A shift from normal ("axis deviation on ECG")
Statistics	Distance from the mean (one value) OR the standardised summary of all distances (SD)

The collision: In everyday language, "deviation" implies something WRONG — a departure from what SHOULD be. In statistics, deviation is neutral — it's just the distance from the mean. A patient with Hb 12.5 (above the mean of 10.2) has a positive deviation. Nothing is "wrong" with her — she's just above average.

Students hear "standard deviation" and their brain echoes "standard error/defect/problem." The word carries a negative valence it shouldn't.

"Variance" in Everyday Language

Context	"Variance" means...
Legal	A permit to deviate from a rule ("zoning variance")
Business	Difference between planned and actual ("budget variance")
Everyday	Disagreement ("at variance with each other")
Statistics	Average squared deviation from the mean

The collision: Everyday "variance" means "difference" or "disagreement." Statistical variance means "average squared spread." When someone says "there's high variance in the data," a clinician hears "the data disagrees with itself" (not quite right) rather than "the data points are widely scattered around the mean" (the actual meaning).

The SD/Variance Confusion Specifically

Why do students mix them up?

They measure the same thing (spread) in different units (original vs squared)
They have different names that don't obviously signal their relationship
SD is reported in papers but variance is used in formulas — students encounter SD in reading and variance in coursework, never connecting them
"Standard deviation" is 2 words and "variance" is 1 word for what students think is the same thing. "Why does the same concept have two names?" — because they're NOT the same concept. They're parent and child.

The Regulatory Dimension

Where FDA Uses Variance (Not SD)

1. ANOVA-Based Primary Analyses

When a pivotal trial uses ANOVA or ANCOVA as the primary analysis:

F = Mean Square (Treatment) / Mean Square (Error) = Variance between groups / Variance within groups

The F-test ratio IS a ratio of variances. Not SDs. Not MADs. Variances. The entire statistical test for whether a drug works in a multi-arm trial is a comparison of two variances.

ICH E9 specifies ANOVA/ANCOVA as the standard analysis for continuous endpoints. Every time FDA evaluates a pivotal trial's primary continuous endpoint, they're looking at a variance decomposition.

2. Homogeneity of Variance (Levene's Test)

Word Surgery: "Homoscedasticity"

Root: Greek homos (same) + skedasis (scattering) → "same scattering" = the spread of data is equal across groups

Cousin: Heteroscedasticity = heteros (different) + skedasis → "different scattering" = unequal spread

ANOVA assumes homoscedasticity — that the variance is the same in every treatment group. If Drug A's group has variance 4 and Drug B's group has variance 25, the F-test becomes unreliable.

FDA reviewers check: "Was homogeneity of variance assessed?" If variances are unequal, the SAP should pre-specify Welch's ANOVA (which doesn't assume equal variances) or a transformation.

The check is on VARIANCE equality, not SD equality. Because the F-test uses variances, and it's the variance assumption that must hold.

3. Sample Size Calculations

The formula:

n = (Zα + Zβ)² × 2σ² / δ²

Note: σ² (variance) in the formula, not σ. The sample size depends on variance. When a sponsor tells FDA "we assumed SD = 10 for our sample size calculation," the formula actually uses SD² = 100 (the variance).

Why variance in the formula? Because the derivation of sample size formulas comes from the distribution of the test statistic, which depends on the variance (through the standard error, which is σ/√n = √(σ²/n)). Variance is the natural unit of the derivation. SD is the reporting convenience.

Real consequence: If you assume SD = 10 (variance = 100) but the true SD is 14 (variance = 196), your variance estimate is off by 96%, not 40%. Errors in SD estimates are AMPLIFIED when squared to variance. A modest underestimate of SD becomes a dramatic underestimate of variance, leading to a catastrophically underpowered trial.

This is why ICH E9 requires: "The assumptions underlying the sample size calculation should be documented and justified." That justification must address the variance estimate, not just the SD.

4. Random Effects Models (Variance Components)

Modern clinical trials use mixed-effects models (MMRM) that partition total variance into:

Between-subject variance (σ²_between): how much patients differ from each other
Within-subject variance (σ²_within): how much the same patient varies over time
Residual variance (σ²_error): unexplained noise

These variance components cannot be decomposed using SD. The decomposition ONLY works with variance because of the additivity property:

σ²_total = σ²_between + σ²_within + σ²_error

FDA evaluates these components to understand: Is the drug effect real (treatment variance), or is it drowned in patient-to-patient variation (between-subject variance) or measurement noise (residual variance)?

5. Bioequivalence — Within-Subject Variance

For generic drug approval, the key number is within-subject variance (σ²_w) — how much the same person's drug levels vary between occasions.

Low σ²_w → narrow 90% CI → easy to prove bioequivalence with small n
High σ²_w → wide 90% CI → need large n to prove bioequivalence

FDA classifies drugs as "highly variable" when within-subject CV > 30% (approximately σ²_w > 0.09 on log scale). These drugs get special regulatory allowance: widened bioequivalence limits (69.84-143.19% instead of 80-125%) because the intrinsic variance makes standard limits unfairly strict.

The regulatory decision to widen BE limits is based on VARIANCE magnitude. A drug's regulatory pathway — standard vs highly variable — is determined by its pharmacokinetic variance.

Branch-by-Branch — Where Variance vs Deviation Bites You

General Medicine

The scenario: You're comparing two diabetes drugs. Both show HbA1c reduction of 0.9%.

Drug A: SD = 0.3% → Variance = 0.09%² Drug B: SD = 1.5% → Variance = 2.25%²

What the variance difference reveals:

Drug A: every patient improved by roughly 0.6-1.2%. Consistent, predictable response. Low variance → the drug works similarly for everyone.

Drug B: some patients improved by 2.5% (spectacular), others WORSENED by 0.6% (harmful). High variance → the drug is a lottery.

The clinical decision: Drug A (consistent modest effect) may be preferable to Drug B (inconsistent dramatic-or-harmful effect), even though the mean effect is identical. Variance is the measure of predictability. SD tells you this in clinical units. Variance tells the statistical machinery this when computing ANOVA, F-tests, and CIs.

A paper that reports only "mean HbA1c reduction 0.9%" without the SD (or variance) is hiding half the story — the most clinically important half.

Surgery

The scenario: Quality control of surgical outcomes. Your hospital tracks 30-day mortality after CABG surgery.

National average mortality: 2.0%, SD = 0.8%.

Your hospital: mortality = 3.1%.

Is your hospital underperforming? You need to know if 3.1% is within the expected range of variation or an outlier.

The calculation uses variance:

z = (Your rate - National mean) / SD = (3.1 - 2.0) / 0.8 = 1.375

At z = 1.375 → p ≈ 0.17. Your hospital is within normal variation (not significantly different from the national average).

But if the national SD were 0.3% (variance = 0.09):

z = (3.1 - 2.0) / 0.3 = 3.67

At z = 3.67 → p < 0.001. Your hospital is a significant outlier. Investigate immediately.

Same mortality rate. Same national mean. Different variance → completely different conclusion. The variance of the benchmarking distribution determines whether your hospital is flagged or not.

Paediatrics

The scenario: Growth monitoring. Weight-for-age z-scores.

A z-score IS a deviation measured in SD units:

z = (Child's weight - Population mean) / Population SD

Word Surgery: "z-score"

"z": The letter z was chosen by convention (likely from German Zufall = chance/random, or simply the last letter of the alphabet for the "standard" variable — the exact origin is debated).

"Score": English, from Old Norse skor = a notch, a tally mark.

→ "z-score" = "a tally of how many standard deviations from the mean"

A z-score of -2.0 means the child weighs 2 SDs below the population mean. This corresponds to approximately the 2.3rd percentile.

The z-score REQUIRES the SD (which requires the variance). Without knowing the population variance, you cannot compute z-scores. Without z-scores, growth charts don't work. Without growth charts, paediatric nutritional assessment collapses.

Every time you plot a child on a growth chart, you're using variance. You just don't see it because it's been pre-computed into the chart's percentile curves.

Obstetrics

The scenario: Nuchal translucency (NT) screening. The NT measurement has a known distribution at each gestational age with a specific mean and SD.

An NT of 3.0 mm at 12 weeks. Is this abnormal?

Mean NT at 12 weeks: 1.5 mm, SD: 0.5 mm.

z = (3.0 - 1.5) / 0.5 = 3.0

3 SDs above the mean → 99.87th percentile → abnormal → counsel for karyotyping.

But what if the SD is wrong? If the reference study used a population with variance 0.25 mm² (SD = 0.5) but your population has variance 0.64 mm² (SD = 0.8):

z = (3.0 - 1.5) / 0.8 = 1.875

Now it's only the 97th percentile. Still elevated, but the clinical urgency is different.

The variance of the reference population directly determines the threshold for "abnormal." Using reference ranges derived from a population with different variance than yours → miscalibrated screening → either too many false positives (unnecessary anxiety and invasive testing) or too many false negatives (missed abnormalities).

Psychiatry

The scenario: Antidepressant trial. MMRM analysis.

The MMRM output provides:

Treatment effect estimate: -3.2 points on HAM-D (drug better than placebo)
Variance of the treatment effect: 1.44 points²
Standard error: √1.44 = 1.2 points
95% CI: -3.2 ± 1.96 × 1.2 = -5.6 to -0.8
p-value: 0.009

Every number in this chain flows from the variance estimate. The SE is √variance. The CI uses the SE. The p-value uses the SE. Get the variance wrong and everything downstream is wrong.

Where it goes wrong: If the MMRM model misspecifies the covariance structure (the pattern of variances and correlations across visits), the variance of the treatment effect is wrong. An unstructured covariance matrix may give variance = 1.44, while an AR(1) structure gives variance = 2.56 (SE = 1.6, CI = -6.3 to -0.1, p = 0.04).

Same data. Same drug. Different variance model → different CI → different p-value → different conclusion.

This is why ICH E9 requires pre-specification of the covariance structure in the SAP, and why sensitivity analyses using alternative structures are mandatory.

Community Medicine / PSM

The scenario: Cluster-randomised trial of a sanitation intervention across 30 villages.

In individual-level analysis: variance of diarrhoea incidence = 4.5 episodes²/year².

But patients within the same village are more similar to each other than patients from different villages (shared water source, same sanitation conditions). This creates clustering — and the relevant variance has two components:

Between-village variance: σ²_between = 2.1
Within-village variance: σ²_within = 2.4
Total variance: 2.1 + 2.4 = 4.5 (additivity!)

Word Surgery: "Intraclass Correlation Coefficient" (ICC)

Root: Intra- (Latin: within) + class (Latin classis: a group, a division) + correlation coefficient

→ "the correlation between members of the same group"

ICC = σ²_between / (σ²_between + σ²_within) = 2.1 / 4.5 = 0.47

An ICC of 0.47 means 47% of the total variance is BETWEEN villages (not between individuals). Ignoring this clustering and analysing as if all patients are independent would dramatically underestimate the variance of the treatment effect → falsely narrow CI → inflated Type I error.

The design effect (correction for clustering) = 1 + (m-1) × ICC, where m = average cluster size.

If m = 30 villagers per village: Design effect = 1 + 29 × 0.47 = 14.6

You need 14.6 times more patients than a non-clustered trial to achieve the same power. A sample size calculation that ignores variance components would underpower the trial by a factor of 14.6.

This is the most dramatic practical consequence of variance decomposition in all of public health research. Ignore the between-cluster variance → your trial is 15× underpowered → you "fail to find an effect" → a life-saving sanitation intervention gets defunded.

Orthopaedics

The scenario: Inter-rater reliability of radiographic measurements.

Three orthopaedic surgeons measure Cobb angle on 50 scoliosis X-rays. You need to assess agreement.

The analysis decomposes total variance into:

Between-patient variance (σ²_patient): real differences in scoliosis severity → this is the SIGNAL
Between-rater variance (σ²_rater): systematic differences between surgeons (one reads consistently higher) → this is BIAS
Residual variance (σ²_error): random measurement error → this is NOISE

σ²_total = σ²_patient + σ²_rater + σ²_error

ICC = σ²_patient / σ²_total

If ICC = 0.90 → 90% of total variance is due to real patient differences, only 10% is rater disagreement + noise → excellent reliability.

If ICC = 0.50 → half the variance is noise/rater bias → you can't trust the measurements → surgical decisions based on Cobb angle (brace vs surgery at 40°) are unreliable.

Without variance decomposition, you can't separate real measurement from measurement error. And in orthopaedics, treatment decisions (surgery vs conservative) often rest on specific numerical thresholds (Cobb angle > 40°, limb length discrepancy > 2 cm). If the measurement variance is too high relative to these thresholds, the measurements are clinically useless.

The 6 Ways Not Knowing Variance vs Deviation Destroys You

1. You can't understand ANOVA — the most common analysis in multi-arm trials

ANOVA = Analysis of VARIANCE. The F-statistic is a ratio of variances. If you don't understand variance, ANOVA is a black box. You push data in, a p-value comes out, and you have no idea what happened inside.

When your examiner asks "explain the ANOVA table," they're asking you to read a variance decomposition. SS (Sum of Squares), df (degrees of freedom), MS (Mean Square = variance), F (ratio of variances). Every column is about variance.

2. You misinterpret sample size calculations

Sample size formulas use σ² (variance), not σ (SD). Underestimating SD by 20% means underestimating variance by 36% (because 0.8² = 0.64, not 0.80). The sample size error is amplified by squaring. A "small" error in SD becomes a "large" error in the sample size calculation.

3. You can't evaluate cluster-randomised trials

Community interventions, school-based programmes, village-level trials — all clustered. Without understanding between-cluster and within-cluster variance components, you can't evaluate whether the trial was adequately powered, whether the analysis accounted for clustering, or whether the results are reliable.

4. You report SD when you should report variance (or vice versa)

Report SD in clinical descriptions: "BP was 128 ± 14 mmHg." (Readers can interpret 14 mmHg.) Use variance in statistical calculations: ANOVA tables, sample size formulas, random effects models. Never report variance to clinicians: "Variance of BP was 196 mmHg²" is uninterpretable.

Mixing them up — reporting variance where SD is expected, or using SD in a formula that requires variance — produces wrong numbers that are hard to catch because they're in the right ballpark (SD and √variance are the same number, but squaring vs not squaring in the wrong place cascades into errors).

5. You don't understand why SD can't be simply "averaged" across studies

In a meta-analysis, you can't average the SDs from two studies to get the combined SD. You CAN average variances (with weighting). This is because variances are additive and SDs are not.

Combined variance: σ²_combined = (n₁σ₁² + n₂σ₂²) / (n₁ + n₂) Combined SD: σ_combined = √(combined variance)

If you average the SDs directly: SD_wrong = (σ₁ + σ₂)/2 → this gives the wrong answer. The error can be small or large depending on how different σ₁ and σ₂ are.

6. You can't participate in any statistical discussion that goes deeper than means and p-values

The moment the discussion touches ANOVA tables, variance components, random effects, design effects, ICC, or covariance structures — you're lost. These are not exotic topics. They're the standard analysis methods for most clinical trials. And they all speak the language of variance, not SD.

The One Thing to Remember

Deviation is the distance from the road. Variance is the SQUARED distance from the road, averaged across all travellers. Standard deviation is the square root of variance — the same information, returned to interpretable units.

Variance exists because squaring gives it a magical property: additivity. Variances add. SDs don't. This one property — discovered by Gauss, exploited by Fisher — is why ANOVA works, regression works, standard errors work, confidence intervals work, and meta-analysis works.

You report SD because humans understand "the average haemoglobin varies by ±1.8 g/dL." You compute with variance because mathematics requires "the average squared variation is 3.24 g²/dL²." The first is for your clinician brain. The second is for the formulas underneath every test you'll ever run.

They're the same information in different costumes. SD is variance in a lab coat, ready for clinical rounds. Variance is SD in overalls, doing the heavy lifting in the machine room.

You need both. You just need them in different places.