Why Can't You Just Study Everyone?

Stateazy Series

Why Can't You Just Study Everyone?

The Problem First

You're an orthopaedics resident. You want to know: does early mobilisation after TKR reduce DVT rates?

Simple answer: check EVERY patient who ever had a TKR. Every hospital. Every country. Every decade. Count the DVTs.

That would give you the truth. The real number. The god-level answer.

But you can't. You'll be dead before you finish collecting data. So you do what every researcher does — you pick some patients, study them, and hope they represent everyone.

That "everyone" is the population. That "some" is the sample.

And the entire edifice of medical research — every p-value, every confidence interval, every drug approval — rests on one terrifying assumption:

That your "some" behaves like the "everyone."

If it doesn't? Every conclusion you drew is garbage.

The Concept — No Jargon First

Think of it this way:

You want to know what the average Indian eats for breakfast. You could:

Option A: Ask all 1.4 billion Indians. (Population.) Perfect answer. Impossible task.
Option B: Ask 2,000 Indians, carefully chosen from different states, ages, income levels. (Sample.) Imperfect answer. Doable task.

The entire game of statistics is: how do I make Option B's answer as close to Option A's answer as possible?

Now let's rip open every term you'll encounter.

Term Deconstruction: Population

Word Surgery
Popul- (Latin populus = "the people") + -ation (act/state of)
Literal meaning → "the state of being all the people"

Why This Name?
The word came straight from census-taking in ancient Rome. When Romans wanted to tax citizens, they counted the populus. Statisticians borrowed it in the 1700s — they were literally helping governments count people for taxation and military conscription. The word stuck even when we started applying it to non-human groups (bacterial colonies, implant failure rates).

The "Aha" Bridge
So "population" literally = the totality of people (or things) you care about. When a paper says "population," it means: the ENTIRE universe this question applies to. Not 500 patients. Not one hospital. EVERYONE the conclusion is supposed to cover.

Naming Family
Populace (the common people), popular (of the people), public (from publicus, contraction of populicus). All from the same root: the collective mass.

Term Deconstruction: Sample

Word Surgery
Sample (Old French essample, from Latin exemplum = "something taken out as an example")
Literal meaning → "a piece taken out to represent the whole"

Why This Name?
Medieval cloth merchants would cut a small swatch from a bolt of fabric so buyers could see the quality without unrolling the whole thing. That swatch was an essample. Statistics borrowed this beautifully — your 200 patients are a swatch cut from the fabric of all possible patients. The quality of that swatch determines whether the buyer (you) gets fooled.

The "Aha" Bridge
So "sample" literally = a small piece meant to represent the whole bolt. The sample is only as good as the cut. If you snip from only the best part of the fabric, the buyer thinks the whole bolt is premium. That's sampling bias.

Naming Family
Example (same root — something pulled out to show), exemplar, exemplify. Also: sampling frame (the list you sample from), sampling error (the gap between your swatch and reality).

Now the Terms — The Full Table

Term	What It Really Means	Symbol
Population	The ENTIRE group you want to draw conclusions about	N
Sample	The subset you actually study	n
Parameter	The true value in the population (you almost never know this)	μ, σ, π
Statistic	The estimate from your sample (this is what papers report)	x̄, s, p
Sampling	The method you use to pick your subset	—
Inference	Using sample data to make claims about the population	—

Term Deconstruction: Parameter vs. Statistic

Parameter

Word Surgery
Para- (Greek para = "beside, alongside") + -meter (Greek metron = "measure")
Literal meaning → "a measure that stands alongside" — i.e., a fixed reference measurement

Why This Name?
In mathematics, a parameter is a constant that defines the shape of a curve. Statisticians adopted it because the population's true mean, true proportion, true standard deviation are fixed constants — they don't change with your study. They exist whether you measure them or not. They stand alongside reality, unchanging. You just can't see them directly.

The "Aha" Bridge
So "parameter" literally = the fixed measure that defines the real shape of things. It's the truth. μ (population mean) is a parameter. You'll never touch it. You can only estimate it.

Statistic

Word Surgery
Statist- (German Statistik, from Latin status = "state/condition of the state") + -ic (pertaining to)
Literal meaning → "pertaining to the state" — originally meant data collected for governance

Why This Name?
The word Statistik was coined by Gottfried Achenwall in 1749. He was a German professor who studied "state arithmetic" — the numbers governments needed to run countries (population counts, tax revenues, army sizes). The suffix shifted to mean "a number derived from data." So a statistic is not truth — it's an estimate your data gives you. It changes every time you take a new sample.

The "Aha" Bridge
So "statistic" = a number from your data. "Parameter" = the number from reality. Papers report statistics. God knows parameters. Every mean, every proportion, every hazard ratio in a paper is a sample statistic pretending to be a population parameter.

Naming Family
Statistics (the field), statistical (adjective), state (same root). Contrast: parameter (fixed, unknowable) vs statistic (estimated, variable).

Term Deconstruction: Sampling

Word Surgery
Sample + -ing (the act of taking samples)
Direct meaning → "the process of selecting a piece to represent the whole"

Why This Name?
Self-explanatory once you have "sample." But the key insight is in WHY it became a field of study. In the early 1900s, Arthur Bowley and others realised that HOW you pick your swatch matters more than how big the swatch is. A small, well-chosen sample beats a huge, badly-chosen one. This sparked the entire field of sampling theory.

The "Aha" Bridge
So "sampling" = the art and science of cutting the right swatch. Bad sampling = bad swatch = wrong conclusions, no matter how expensive your lab is.

Term Deconstruction: Inference

Word Surgery
In- (Latin = "into") + -fer (Latin ferre = "to carry") + -ence (state/process)
Literal meaning → "the process of carrying meaning inward" — i.e., carrying conclusions from what you observed into what you haven't observed

Why This Name?
Logicians used inferre to mean "to deduce" — literally carrying meaning from known premises into unknown conclusions. Statistical inference is the same: you carry the results from your sample (known) into claims about the population (unknown). It's an intellectual leap, and the whole field of inferential statistics exists to quantify how risky that leap is.

The "Aha" Bridge
So "inference" literally = carrying meaning from the seen to the unseen. Every time you say "this drug works" based on a trial of 300 patients, you're inferring — carrying the result from 300 to 300 million. The confidence interval tells you how shaky that bridge is.

Naming Family
Infer (to conclude), refer (to carry back), transfer (to carry across), differ (to carry apart). All from ferre = to carry.

The Analogy That Makes It Stick

You're cooking a huge pot of dal. You want to check if there's enough salt.

Population = the entire pot
Sample = the one spoonful you taste

But here's the catch:

If you stir well before tasting → your spoonful represents the whole pot → good sample
If you scoop from the top without stirring → you get the watery part, miss the settled masala → biased sample
If you taste one grain of dal → too small to judge → inadequate sample size

Stirring = randomisation. Spoonful size = sample size. Tasting = measurement.

Every sampling mistake in medical research maps to a dal-tasting mistake.

Branch-by-Branch — Where This Bites You

General Medicine

The trap: A paper says "Metformin reduces HbA1c by 1.2% in type 2 diabetes."

But the sample was 200 patients from a tertiary diabetes centre — highly motivated, compliant, mostly urban, BMI 28-32.

Your patient? A 65-year-old from a rural PHC, BMI 38, poor compliance, multiple comorbidities.

The sample doesn't represent YOUR patient's population. That 1.2% might be 0.4% in real-world practice. This is why effectiveness ≠ efficacy.

Surgery

The trap: "Laparoscopic approach has 2% complication rate for cholecystectomy."

Sample: 500 cases from a high-volume centre with fellowship-trained surgeons.

Your setting? District hospital, general surgeon doing 3 lap choles a month.

The population the study claims to represent (all cholecystectomy patients) is not the population it actually represents (high-volume expert centres). Apply that 2% to your setting and you'll underconsent your patients.

Paediatrics

The trap: Drug dosing studies in children are done on samples of 30-50 kids. Ethical constraints make large paediatric samples nearly impossible.

But children aren't small adults. A sample of 40 kids aged 2-12 from one European country is being used to dose a baby in rural India.

Small sample + narrow population = your patient may not be in the universe this sample represents.

Obstetrics

The trap: "Induction at 39 weeks reduces C-section rates" (ARRIVE trial).

Sample: Low-risk nulliparous women in US academic centres. Excluded: BMI >40, prior uterine surgery, fetal anomalies, multiples.

Your patient has GDM, BMI 36, and is at a primary health centre with one OT.

The ARRIVE trial population is NOT your population. Applying its conclusion blindly could be dangerous.

Psychiatry

The trap: Antidepressant trials consistently exclude suicidal patients, substance use disorders, and personality disorders. The sample is "clean" depression.

But 40-60% of your real patients have comorbid substance use or personality features.

The population in the trial exists in textbooks. Your population exists in OPD. The gap between them is where treatment failures hide.

Community Medicine / PSM

The trap: A prevalence survey says "diabetes prevalence in India is 11.8%."

Sample: 10,000 people from 8 metros.

But India's rural population (65%) was barely sampled. Tribal areas? Zero representation.

You're making national health policy based on a sample that represents urban India. The "population" in the title and the population the sample actually represents are different things.

Orthopaedics

The trap: Implant survival studies for total hip replacement.

Sample: 300 patients from a Nordic joint registry. Mean age 68. BMI 26. Low activity level.

Your patient: 52-year-old Indian farmer. BMI 22. Squats for toilet. Walks 5km daily.

The implant's 15-year survival in sedentary Scandinavians tells you nothing about survival in a high-demand Indian patient. Different population, different wear pattern, different outcome.

Radiology / Pathology

The trap: Diagnostic accuracy studies.

"Sensitivity of MRI for ACL tears: 95%."

Sample: Prospective study in a sports medicine centre with 3T MRI and MSK fellowship radiologists.

Your setting: 1.5T MRI, general radiologist, non-sports population with chronic knee pain.

The sample's sensitivity was measured under ideal conditions. Your population's sensitivity will be lower. If you quote 95% sensitivity to your clinician, you're overconfident.

The 4 Ways Not Knowing This Destroys You

1. You generalise results to the wrong patients

A trial on 40-year-old men gets applied to 70-year-old women. The sample never included them. The population the paper claims to represent and the population it actually represents are different — and you didn't notice because you never checked the Methods section.

This is a failure of generalisability — a term worth deconstructing.

Term Deconstruction: Generalisability (External Validity)

Word Surgery
Generalis- (Latin generalis = "relating to the whole kind/genus") + -ability (capacity to)
Literal meaning → "the capacity to apply to the whole kind, not just the specific instance"

Why This Name?
From genus (Latin = "kind, type, race"). When you generalise, you extend a finding from your specific sample to the general kind. Scientists chose this word because the fundamental question is: does this result belong only to these 200 patients, or to the entire genus of patients with this condition?

The "Aha" Bridge
So "generalisability" literally = can this move from the specific to the general? From your swatch to the whole bolt? A study with high generalisability = a well-cut swatch. A study with low generalisability = you're showing silk when the bolt is burlap.

Naming Family
General (of the genus), genus (a kind), generic (belonging to the kind), generate (to bring into the kind). Contrast: specific (from species = a particular sort within the genus).

2. You don't check HOW the sample was chosen

There are good samples and garbage samples. Let's deconstruct each method:

Sampling Method	What It Is	Verdict
Simple random	Every person has equal chance of selection	Gold standard
Stratified	Divide population into subgroups, sample from each	Better than simple random for heterogeneous populations
Systematic	Every kth person	OK if no hidden pattern in the list
Convenience	Whoever walks into your OPD	Garbage for generalisation, fine for pilot studies
Purposive	Hand-picked to match criteria	Useful for qualitative, terrible for quantitative
Snowball	One participant recruits the next	For hidden populations (drug users, rare diseases)

Most Indian thesis research uses convenience sampling and then writes "the results can be generalised to..." No, they can't. A convenience sample generalises to nothing except itself.

Now let's crack open the names of each one:

Term Deconstruction: Random (as in Random Sampling)

Word Surgery
Random (Old French randon = "great speed, impetuosity, rushing" — from randir = "to gallop")
Literal meaning → "moving at great speed without aim or direction"

Why This Name?
Originally described a horse galloping without a rider controlling it — going wherever it pleased. By the 1600s, English used "at random" to mean "without deliberate choice." Mathematicians formalised it: a random process is one where no individual outcome can be predicted in advance, and every outcome has a known probability.

The "Aha" Bridge
So "random" literally = galloping without direction. Random sampling = letting the selection gallop freely, so no human bias steers it toward certain patients. It's NOT haphazard (which has no structure) — it's carefully structured to be uncontrollable in the selection of any particular individual.

Naming Family
Randomise (to make random), randomisation (the process), random variable, pseudo-random (computer-generated, looks random but isn't truly). Contrast: haphazard (from hap = luck — genuinely unstructured).

Term Deconstruction: Stratified (Sampling)

Word Surgery
Strat- (Latin stratum = "something spread/laid down" → a layer) + -ified (made into)
Literal meaning → "made into layers"

Why This Name?
Geologists used strata for rock layers. Statisticians borrowed it: you divide the population into layers (age groups, income brackets, disease severity) and sample from EACH layer. Jerzy Neyman formalised optimal allocation in stratified sampling in 1934.

The "Aha" Bridge
So "stratified" literally = layered. You're slicing the population like geological layers and taking a core sample from each. This guarantees every layer is represented, unlike simple random where some layers might be missed by chance.

Naming Family
Stratum (one layer), strata (multiple layers), stratosphere (upper layer of atmosphere), substrate (layer underneath). Also: stratified randomisation in RCTs = randomising within layers.

Term Deconstruction: Systematic (Sampling)

Word Surgery
System- (Greek systema = "organized whole, composed of parts" → from syn- "together" + histanai "to set up") + -atic (pertaining to)
Literal meaning → "pertaining to an organised arrangement of parts"

Why This Name?
You set up a SYSTEM: pick a starting point, then take every k-th person from a list. It's systematic because it follows a rigid, repeatable pattern. Unlike random (no pattern) or convenience (no rules), systematic sampling has a clear rule that anyone could replicate.

The "Aha" Bridge
So "systematic" literally = following a system. Every 5th chart, every 10th house, every 3rd patient on the list. The danger? If the list itself has a hidden pattern that matches your interval (e.g., every 5th bed in a ward is near the window), you get a biased sample.

Term Deconstruction: Convenience (Sampling)

Word Surgery
Con- (Latin = "together") + -venience (from venire = "to come")
Literal meaning → "coming together easily" → what comes to hand easily

Why This Name?
The researcher samples whoever is convenient — whoever comes together in one place at one time. No effort, no system, no randomisation. It's named for its defining feature: ease.

The "Aha" Bridge
So "convenience" literally = whatever comes together easily. Your OPD patients today are convenient. That's not the same as representative. The patients who show up on a Tuesday morning at a tertiary hospital are a very specific, self-selected group. They "conveniently" showed up — but they don't represent anyone else.

Term Deconstruction: Purposive (Sampling)

Word Surgery
Purpos- (Old French porposer = "to put forth" → Latin pro- "forth" + ponere "to place") + -ive (tending toward)
Literal meaning → "tending toward a placed-forth aim" → selected on purpose

Why This Name?
The researcher deliberately places forth specific criteria and selects only participants who meet them. It's sampling with a purpose — you're hand-picking, not letting chance decide. Common in qualitative research where you want specific perspectives, not statistical representativeness.

The "Aha" Bridge
So "purposive" literally = done with a deliberate aim. You pick who you want to study. The upside: you get exactly the voices you need. The downside: you get exactly the voices you chose, and nothing else.

Term Deconstruction: Snowball (Sampling)

Word Surgery
Snowball (English metaphor = a ball of snow that grows as it rolls downhill)
Literal meaning → "growing by accumulation as it moves"

Why This Name?
Sociologist Leo Goodman formalised this in 1961 as "snowball sampling." You find one participant → they refer you to another → who refers you to another → the sample grows like a snowball rolling downhill. It was designed for hidden or hard-to-reach populations: injecting drug users, undocumented immigrants, men who have sex with men in criminalised settings.

The "Aha" Bridge
So "snowball" literally = the sample grows by rolling through networks. Each participant is the snow that picks up the next layer. The problem: you only reach people within the SAME social network. Isolated individuals get missed entirely.

Naming Family
Chain-referral sampling (formal name), respondent-driven sampling (RDS — a mathematical improvement on snowball). Also: network sampling, link-tracing.

3. You confuse a large sample with a representative sample

n = 10,000 from one hospital is LESS representative than n = 500 from 20 hospitals across 10 states.

Size ≠ representativeness. A huge biased sample is just a huge bias.

Analogy: Tasting 50 spoons from the top of an unstirred pot doesn't tell you more than 1 spoon from a well-stirred pot. Volume of tasting doesn't fix bad stirring.

4. You can't evaluate external validity

Every paper has two questions:

Internal validity: Was the study done correctly? (Design, bias, analysis)
External validity: Do the results apply outside this study? (Sample → population generalisability)

Let's break these open:

Term Deconstruction: Internal Validity

Word Surgery
Internal (Latin internus = "within") + Validity (Latin validus = "strong, powerful")
Literal meaning → "the strength of what's within" → how strong the study is on its own terms

Why This Name?
Donald Campbell and Julian Stanley coined this pair (internal/external validity) in their landmark 1963 monograph on experimental design. Internal validity asks: within the walls of this study, are the conclusions trustworthy? Is the experiment strong internally?

The "Aha" Bridge
So "internal validity" literally = the inner strength of the study. Did they control confounders? Was there selection bias? Was measurement accurate? A study can be internally flawless (perfect RCT design) and externally useless (sample so narrow it applies to nobody).

Term Deconstruction: External Validity

Word Surgery
External (Latin externus = "outside") + Validity (Latin validus = "strong")
Literal meaning → "the strength of what's outside" → how strong the conclusions are when applied beyond the study

Why This Name?
Campbell and Stanley's complement to internal validity. External asks: outside these study walls, do the results still hold? Can you carry them to other patients, other settings, other times?

The "Aha" Bridge
So "external validity" literally = does the study's strength survive outside its own walls? This is the sample-to-population question. If your sample was only 40-year-old Japanese men, the study's external validity for 70-year-old Indian women is approximately zero.

Naming Family
Internal vs. External (within vs. outside), ecological validity (does it work in real-world settings?), transferability (the qualitative research equivalent), generalisability (synonym for external validity in common usage).

If you don't understand sample vs population, you can only assess internal validity. You'll read a perfectly designed RCT and assume it applies to your patient — when the sample was so restrictive that your patient would never have been enrolled.

The One Thing to Remember

Every number in a medical paper is a sample statistic pretending to be a population parameter.

Before you believe it, ask:

"Who was actually studied, and is my patient anything like them?"

That's the sample-to-population question. That's the foundation everything else in statistics sits on. Get this wrong, and nothing downstream — no p-value, no confidence interval, no meta-analysis — can save you.