📊 Statistics Calculator
Calculate mean, median, mode, standard deviation, variance, quartiles, and more. Includes box plot visualization and outlier detection for comprehensive statistical analysis.
Calculate mean, median, mode, standard deviation, variance, quartiles, and more. Includes box plot visualization and outlier detection for comprehensive statistical analysis.
Mean (Average): The sum of all values divided by the count. Most affected by extreme values (outliers). Best used when data is normally distributed without significant outliers.
Median: The middle value when data is sorted. Not affected by outliers. Better than mean for skewed distributions or data with outliers. For even-numbered datasets, it's the average of the two middle values.
Mode: The most frequently occurring value(s). Useful for categorical data or finding the most common value. A dataset can have no mode, one mode (unimodal), or multiple modes (bimodal, multimodal).
Range: The difference between the maximum and minimum values. Simple measure of spread but sensitive to outliers.
Variance: The average of squared differences from the mean. Measures how spread out the data is. Units are squared, making interpretation less intuitive.
Standard Deviation: The square root of variance. Most commonly used measure of spread. Same units as the original data. About 68% of data falls within 1 standard deviation of the mean in normal distributions.
Interquartile Range (IQR): The difference between Q3 and Q1. Contains the middle 50% of data. Resistant to outliers and useful for identifying them.
Q1 (First Quartile): 25% of data falls below this value.
Q2 (Second Quartile/Median): 50% of data falls below this value.
Q3 (Third Quartile): 75% of data falls below this value.
Box plots visualize the five-number summary: minimum, Q1, median, Q3, and maximum. They make it easy to see the distribution shape, central tendency, and identify potential outliers.
Outliers are data points that differ significantly from other observations. They can indicate measurement errors, data entry errors, or genuinely exceptional cases.
This calculator identifies outliers using the IQR method:
Mean, median, and mode all describe the "center" of a dataset, but they're appropriate for different situations. Choosing the wrong measure can seriously mislead your analysis.
Consider home prices in a neighborhood: $200K, $220K, $210K, $250K, $215K, $1,800K (one mansion). Mean: $482K × which says nothing useful about a typical home. Median: $215K × which accurately represents the middle of the market.
This is why median household income is used instead of mean: a few billionaires would inflate the mean dramatically. Median income gives the most accurate picture of what a typical household earns.
Mode is essential for categorical (non-numeric) data where mean and median are meaningless. If you survey shoe size preferences, mode (most popular size) is the only meaningful average. In continuous data, mode occurs naturally in multimodal distributions (datasets with multiple peaks) that reveal subgroups in your data.
If your data has outliers or is clearly skewed (income, prices, response times), use the median. If data is roughly symmetric with no extreme outliers (heights, test scores in a class), use the mean. For equal weight to all values in decision-making, mean is appropriate.
Two datasets can have identical means but completely different distributions. Standard deviation (SD or s) quantifies how spread out data points are from the mean. A small SD means data clusters tightly around the average; large SD means data is highly variable.
Example: Test scores × Class A: 70, 72, 68, 73, 71. Mean = 70.8, SD = 1.6 (very consistent). Class B: 40, 60, 90, 85, 79. Mean = 70.8, SD = 20.3 (widely varied). Same mean, completely different teaching outcomes.
For data that follows a normal distribution (bell curve), the empirical rule states:
This is why "within 2 standard deviations" is often used as a benchmark for "normal" in quality control and scientific research. Values beyond ×3 SD are extremely unusual (0.3% probability) and often warrant investigation as potential outliers or errors.
Variance is simply SD squared (SD×). Both measure spread, but SD is expressed in the same units as your original data (dollars, meters, points), making it more interpretable. Variance is useful in mathematical calculations (such as ANOVA) where squared units are needed. For communication and interpretation, always report SD.
Z-scores (z = (x - mean) / SD) standardize values from any distribution to a common scale. A z-score of +2 means the value is 2 standard deviations above the mean × occurring in only ~2.5% of values in a normal distribution. Z-scores let you compare values from completely different scales.
The Interquartile Range (IQR) method is the most robust standard for outlier detection. Calculate:
Do not automatically remove outliers! First, investigate: Is it a data entry error? (Fix or remove.) Is it a legitimate extreme value? (Keep it × it's real.) Is it from a different population? (May need to model separately.) From removing legitimately extreme values, you can create a biased analysis that doesn't reflect reality.
When outliers persist after investigation, consider: reporting results with and without outliers; using median instead of mean; applying logarithmic transformation if data spans orders of magnitude; or using robust statistical methods designed to handle outliers.
The box plot (box-and-whisker plot) visually shows median, quartiles, and outliers at a glance. Our statistics calculator generates one automatically. A box plot should be your first step when exploring a new dataset × it immediately reveals skewness, spread, and outliers before you even calculate a single statistic.
Understanding statistical concepts is essential for interpreting data correctly. Here are the most commonly used critical values and distribution thresholds:
| Confidence Level | Significance Level (a) | Z-score (two-tailed) | Common Use |
|---|---|---|---|
| 80% | a = 0.20 | ×1.282 | Exploratory analysis |
| 90% | a = 0.10 | ×1.645 | Business decisions |
| 95% | a = 0.05 | ×1.960 | Standard in most research |
| 98% | a = 0.02 | ×2.326 | Clinical studies |
| 99% | a = 0.01 | ×2.576 | Pharmaceutical trials |
| 99.7% | a = 0.003 | ×3.000 (3-sigma) | Manufacturing / Six Sigma |
| Measure | Definition | Best Used When |
|---|---|---|
| Mean | Sum × count | Data has no outliers, symmetric distribution |
| Median | Middle value when sorted | Outliers present (incomes, home prices) |
| Mode | Most frequent value | Categorical data, bimodal distributions |
| Std Deviation | Avg distance from mean | Measuring variability around the mean |