Histogram with normal distribution overlay in Excel

This tutorial will walk you through plotting a histogram with Excel and then overlaying normal distribution bell-curve and showing average and standard-deviation lines.

To produce my random normal samples I used VBA function RandNormalDist by Mike Alexander. I created samples with a mean of 100 and standard deviation of 25, function RandNormalDist(100, 0.25).

The actual mean and standard deviation was 100.84 and 27.49 respectively.

Normally distribution

The samples can be checked to confirm normally distributed by comparing the mean, median and mode which should all be equal.

Mean 100.84
Median 99.5
Mode 89

Histogram

The first thing to do is produce the histogram. This is done by creating bins of a certain width and counting the frequency of the samples that fall in each bin.

Start by calculating the minimum (28) and maximum (184) and then the range (156). Using Sturges’ formula the number of bins is 9, using the square root method the number of bins is 15.

To get a bin width, divide the range (156) by the number of bins (9) which results in 17.33, round this up to an even 20 to produce nice round bin widths.

Set up the bins starting at the minimum and ending at the maximum, using the Excel FREQUENCY function to determine frequency in each bin. The FREQUENCY Function must be entered as an array (ctrl-enter).

Bin Frequency
0 0
20 0
40 1
60 12
80 32
100 56
120 51
140 28
160 19
180 0
200 1

Using a column chart a histogram can be produced.

Histogram using standard column chart

Histogram using Scatter Chart

Overlaying a normal curve is a little trickier, firstly, the above column chart can’t be used and the histogram must be produced using a scatter chart.

Select the data and produce a scatter chart with smooth lines.

Scatter chart as histogram

Select the chart and click on the ribbon menu, Layout, then Error Bars and then More Error Bars Options. Select Display Direction Minus, End Style No Cap and Error Amount Percentage 100%.

Scatter chart as histogram format error bars

This will produce a scatter chart with the following error bars.

Scatter chart as histogram with error bars

Increase the Line Style Width so that it starts looking like a histogram with no gaps.

Scatter chart as histogram using error bars

Finally, remove the scatter chart line.

Scatter chart as histogram using error bars finished

Normal Curve

For the normal curve the points need to be created first. The bell curve looks nice when it covers the full 6 standard deviations. 100 points will be created for a nice smooth curve.

Multiply the standard deviation (27.49) by 6 to get 164.96, divide by 100 to get an increment of 1.6496. Starting at minus 3 standard deviations (equal to the mean minus 3 standard deviations (18.36)) increment the value by 1.6496 all the way up to positive 3 standard deviations(183.32). There will actually be 101 total points.

Now for each of those points the normal distribution shall be calculated using Excel’s NORMDIST function. The first parameter is the values we calculated, the second the mean, the third the standard deviation and the last should be FALSE as we don’t want cumulative (NORMDIST(Q1,100.84,27.49,FALSE)).

If you plot the data you will notice a very short normal distribution curve, barely visible as a bell curve due to differences in scale. For our sample of 200 points with bin width of 20, each sample represents a square of 20 by 20. So the total area of our histogram is 200 by 20 which is 4000. The normal distribution has a total area of 1, so the normal curve must be scaled by 4000. And this produces a nice bell-shaped normal curve over the histogram.

Scatter chart as histogram with normal curve

Histogram correction

You may notice that the histogram and bell curve is a little out of sync, this is due to the way the bins widths and frequencies are plotted. If you have a bin width of 20, and the bin value is 40, the corresponding frequency is all values between 20 and 40. When you plot this value on a scatter chart, the centre of the bar is at 40 and the bar width being plus and minus half the bin width (10), which is 30 to 50 respectively. That’s why the histogram looks shifted to the right.

To fix this, create a temporary fixed bin that has half the bin width (10) subtracted from it and use this when plotting the histogram.

Bin Frequency Corrected Bin Frequency
0 0 -10 0
20 0 10 0
40 1 30 1
60 12 50 12
80 32 70 32
100 56 90 56
120 51 110 51
140 28 130 28
160 19 150 19
180 0 170 0
200 1 190 1

Scatter chart as histogram with normal curve corrected

Mean and Standard Deviation

Since it is a scatter chart, it is possible to add additional indicators including mean and standard deviation lines.

Simply produce a single line segment from 0 to the height of the bell curve using the previous NORMDIST function.

-3sigma 18.35943 0
18.35943 0.644784
-2sigma 45.85295 0
45.85295 7.855081
-1sigma 73.34648 0
73.34648 35.20403
mean 100.84 0
100.84 58.04164
1sigma 128.3335 0
128.3335 35.20403
-2sigma 155.827 0
155.827 7.855081
3sigma 183.3206 0
183.3206 0.644784

Scatter chart as histogram with normal curve, mean and standard deviation

Tidying up the colours results in the following final histogram with overlaid normal curve and mean and standard deviation indications.

Scatter chart as histogram with normal curve, mean and standard deviation final

Download the Excel workbook here.


Posted

in

by

Comments

4 responses to “Histogram with normal distribution overlay in Excel”

  1. Chris Avatar
    Chris

    when you get to the normal curve how do you plot the scatter chart when you are already using a bar graph? I’m using excel 2013

  2. Chris Avatar
    Chris

    Ahh I got it. Right click then select change chart type. Use a combo chart.

  3. Chris Avatar
    Chris

    What is the formula for scaling the normal curve? The 200 you describe is that from the number is samples being 200 or from the range of your data being 0-200?

    bin size^2 * #of samples?

    1. Austin Avatar
      Austin

      I realize this post is pretty old, but I also would like to know this calculation. The one you proposed, bin size^2 * #of samples does not fit my data well at all.

Leave a Reply

Your email address will not be published. Required fields are marked *