Calculate Standard Deviation with C#

Standard deviation in a data distribution is the amount of variance in between values within a population or sample. The higher the standard deviation, the greater the variance in the dataset.
Although the calculation is complex and can be time consuming to do by hand. Therefore, it's easier to calculate the standard deviation using a spreadsheet program or a calculation feature offered by software.
In this blog post, I'll be showing you how to write a small program that can take a dataset or list of values and find the standard deviation in that dataset.
How to Do the Manual Process
These are the steps for finding standard deviation:
Find the mean or average in the dataset
Once you find the average you'll need to find the distance between the mean and the datapoint value
Then find the sum of your calculated values from the previous step
Divide your sum by the number of datapoints that you have. If this dataset is a sample of a population then you would subtract one from your count of datapoints
Finally square root the result
Writing the Solution
Let's write the code. Open up Visual Studio and create a new solution. Next, let us create a class called StandardDeviation
public class StandardDeviation
{
}
In order to find the standard deviation we need a dataset to perform our calculation. We'll need to create a constructor that will accept the dataset. The datatype should be an interface IEnumerable so we can accept collection types that use that interface. Also we want to store this value after the calculation and use it by reference in other parts of our code. We shall call this class field Value and make it getter only.
public class StandardDeviation
{
public double Value { get; internal set; }
public StandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
}
}
The first step in finding standard deviation is getting the average or mean of all datapoints. But first, we need to create a method that will house our calculation steps. This method will be called CalculateStandardDeviation and it will used when the class is constructed.
public class StandardDeviation
{
public double Value { get; internal set; }
public StandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
Value = CalculateStandardDeviation(dataset, sample);
}
private double CalculateStandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
int datapointCount = sample ? dataset.Count() - 1 : dataset.Count();
double mean = dataset.Sum() / datapointCount;
}
}
Depending how we are deriving dataset, we'll need to decide when to find the mean using the entire count of the dataset or subtracting one count when calculating a sample. We added additional argument to determine if the dataset is a sample using sample.
The next step we'll need to find the distance of each datapoint from the mean
public class StandardDeviation
{
public double Value { get; internal set; }
public StandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
Value = CalculateStandardDeviation(dataset, sample);
}
private double CalculateStandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
List<double> squaredDistances = new List<double>();
double meanSquaredDistances = 0;
int datapointCount = sample ? dataset.Count() - 1 : dataset.Count();
double mean = dataset.Sum() / datapointCount;
foreach (double datapoint in dataset)
{
double distance = Math.Pow(Math.Abs(datapoint - mean), 2);
squaredDistances.Add(distance);
}
}
}
This can found by iterating over our datapoints and finding the difference between the datapoint value and the mean. Since standard deviation cannot be negative, we're finding the absolute value of our result and raising it to the power of two. Once we have our value, we inject this value into a collection variable that contains our distances.
Next we'll sum our distances and divide by our number of datapoints (or number of datapoints minus 1). Finally we find the square root of mean of distances.
public class StandardDeviation
{
public double Value { get; internal set; }
public StandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
Value = CalculateStandardDeviation(dataset, sample);
}
private double CalculateStandardDeviation(IEnumerable<double> dataset, bool sample = false)
{
List<double> squaredDistances = new List<double>();
double meanSquaredDistances = 0;
int datapointCount = sample ? dataset.Count() - 1 : dataset.Count();
double mean = dataset.Sum() / datapointCount;
foreach (double datapoint in dataset)
{
double distance = Math.Pow(Math.Abs(datapoint - mean), 2);
squaredDistances.Add(distance);
}
meanSquaredDistances = squaredDistances.Sum() / datapointCount;
return Math.Sqrt(meanSquaredDistances);
}
}
In this article, I showed how to code a straightforward standard deviation solution using C#. The solution takes in a dataset and we declare if the dataset is the population or the sample. It then finds the mean, calculates the datapoint values against the mean, finds the mean of the distances, and finally square roots that mean.
Thanks for checking out my blog post!