Standard deviation is one of the most important statistical tools. It helps statisticians and researchers from various fields to find the average amount of variability present in a dataset. In simpler terms, standard deviation is the measure of spread found in a data set. The measure of spread is devised from the mean value. Standard deviation takes into account all the data values present which makes it a more reliable statistical tool as compared to others used for this purpose.
What is Standard Deviation
Variability refers to the dispersion of values in a data set. This variability exists in the data set on account of the random error that occurred while performing the experiments for the sake of data collection. There are four measures of variability in a data set mainly:
- Range
- Interquartile range
- Variance
- Standard Deviation
Range calculates the difference between the lowest and the highest values of a dataset. Interquartile range on the other hand is another measure of spread of a data that calculates the difference between the upper quartile and the lower quartile. The range takes into account the spread of interior values of a data set only while the interquartile range is specifically important when extreme outliers are present in the data. The standard deviation thus is an alternative means of calculating data dispersion, more adequate in determining the spread of the entire dataset in the absence of any extreme values.
This article is particularly focused on everything you need to know about standard deviation so keep reading more to know it all!
Standard Deviation Definition
Standard deviation (SD) is a statistical tool for measuring the spread of all the data values by finding how far each data value lies from the mean value. The mean value is calculated by taking the sum of all the values in the data set divided by the total number of values present in that data set.
Standard Deviation Formula
The standard deviation of an experimental data can be calculated by the formula given below:
The symbol σ represents standard deviation, ∑ indicates the sum of all the values, xi represents an individual data value, x̅ stands for the mean of the values while N is the total number of values present in the data set.
The expression Nitself is indicative of another statistical parameter called variance. Variance represents the mean of the squared distances from the mean. Standard deviation thus can alternatively be defined as the positive square root of variance. Let us understand how to calculate standard deviation more clearly with an example.
How to find Standard Deviation
Example Taking the example of a small data set based on 9 values, the following step-by-step guide can be used to find the standard deviation for this data set.
Data set : | 49 | 56 | 55 | 68 | 61 | 57 | 61 | 52 | 63 |
Step I: Find the mean of these values by using the formula given below.
Step II: Now calculate (xi-x̅)2 for each individual value by subtracting the mean from each data value followed by taking the whole square of it.
xi | xi-x̅ | (xi-x̅)2 |
49 | 49-58=9 | (-9)2=81 |
56 | 56-58=-2 | (-2)2=4 |
55 | 55-58=-3 | (-3)2 =9 |
68 | 68-58=10 | (10)2=100 |
61 | 61-58=3 | (3)2=9 |
57 | 57-58=-1 | (-1)2=1 |
61 | 61-58=3 | (3)2 =9 |
52 | 52-58=-6 | (-6)2=36 |
63 | 63-58=5 | (5)2=25 |
Step III: Take sum of all the (xi-x̅)2 values to satisfy the expression ∑(xi-x̅)2 of the standard deviation formula.
∑(xi-x̅)2= 81+ 4+9+100+9+1+9+36+25= 274
Step IV: Calculate variance by dividing the value obtained in Step III with N i.e., 9 for this data set.
Step V: Calculate standard deviation in the final step by taking positive square root of the variance.
Result: This value of standard deviation indicates that each value in the data set deviates from the mean by 5.52 units on average.
You can also use your data set, the mean value and the standard deviation to plot a bell-shaped curve in order to represent the distribution of your data graphically. A detailed guidance on how to draw a bell curve in Ms. Excel is provided in this article.
The standard distribution of the data set given in the example above is shown in Figure 1 plotted at Ms. Excel.
Where to apply Standard Deviation
The concept of standard deviation is most applicable in normal distributions. A normal distribution represents the symmetrical distribution of a data without any extreme outliers. A normal distribution curve (as shown in Figure 2 below) is based on the cluster of most of the data points around the central region. Standard deviation proves useful in demonstrating how the data spreads out from the center of the distribution on an average.
Standard deviation is a valuable statistical entity specifically in the scientific world because most scientific variables are normally distributed. The concept holds a significant position in analytical chemistry where sampling is required from a larger population. Calculating standard deviation is a means of indicating the precision of the results obtained from an analysis.
A higher standard deviation value means a wider spread of the data from the mean value thus a lower precision. In contrast to that, a small standard deviation value infers the clustering of all the values close to the mean value thus greater precision of the analytical results achieved. A standard deviation value lower than 2 marks a good precision of the analytical method or instrument.
Figure 2 above shows that the curve obtained from a highly precise data set has a high peak and a small spread. Contrarily, the curve obtained from a low precision data set is relatively flat and more widespread.
The empirical rule in statistics is often used to indicate where most of the values in a normal distribution data set lies with reference to the standard deviation and the mean. Read more about empirical rule here.
What is the difference between Sample SD and Population SD
A slight variation can be made when applying the formula for calculating standard deviation based on the data set available. If the data set is considered as a population on its own then the term N (total number of data points) is used in the formula as shown in the calculations above. In case of data sampling i.e., obtaining some data points as a sample from a larger population, the term N is replaced with N-1 in the formula as shown below.
N-1 represents the degree of freedom. The degree of freedom for a larger population gives a certain set of values involved in a calculation some leverage/freedom to vary without affecting the final results. In simpler terms, a population SD is a fixed value calculated from every individual data point in the population. Sample SD on the other end is calculated by choosing a few (preferably the best) data points from a big population. So which type of standard deviation calculation is more valuable for an analytical chemist we talked about earlier? Obviously sample SD because experimental analyses are always performed on a chosen sample from a larger population.
Learn more about the relevance and incorporation of the degree of freedom in standard deviation calculations through this informative statistical session.
What is Relative Standard Deviation
The relative standard deviation (RSD) is a special form of standard deviation. It is more convenient to use when you need to compare how much the standard deviation of a data is larger than the mean. RSD is often calculated as a percentage by multiplying the standard deviation value with 100.The product obtained is then divided by the absolute value of the mean of the data as shown in the example below.
SD=5.52
x̅ = 58
Result: The relative standard deviation of the data is 0.095 or 9.52% (in terms of %RSD) i.e., the standard deviation of the data is 9.52% larger than its mean.
Interpreting results in terms of the relative standard deviation is important as a measure of repeatability i.e., when an experiment is carried out on the same sample on multiple occasions. For instance, in performing bioassays.
RSD is quite similar to another statistical term called coefficient of variation (CV%). The two differ however in their calculation in the aspect that CV% is calculated by dividing SD with the mean value while RSD takes into account the absolute of the mean value only i.e., the non-negative value. In accordance with this definition, relative standard deviation is always a positive value (just like SD itself) while the coefficient of variation could either be positive or negative.
Additionally, the relative standard deviation cannot be used as a useful measure of spread when the mean of a data set is equals to zero (which is possible in certain cases).
Conclusion
The calculation of standard deviation is a challenging task especially when real world examples with complex data sets are involved. Statisticians today thus use well integrated softwares and spreadsheets to calculate standard deviation. Nevertheless, having a basic understanding about the principles involved in these complex calculations is very important and is what you have learnt through this article.
Practice more on standard deviation calculation via an interactive example here.
Exercise
Test Yourself by applying the knowledge you gained in this article to answer the following set of questions:
Q.1) Apply the standard deviation formula to calculate SD for the data sets given below:
a) Data Set : | 4 | 12 | -2 | 7 | 0 | 9 |
(Answer: SD=4.90)
b) Data Set : | 15.2 | 12.3 | 5.7 | 4.3 | 11.2 | 2.5 | 8.7 |
(Answer: SD= 4.28)
Q.2) The standard deviation of 10 values from a data set is 2.8. The sum of the squares of these 10 values is 92.8. Use the standard deviation formula to determine the mean value for this data set.
(Answer: Mean=± 1.2)
Q.3) Calculate relative standard deviation (RSD) for the data provided in Q.1 part a
(Answer: RSD=9.8 or 98%)