Exercise #14 - Benford's Law

The purpose of this exercise is to give you a little practice using arrays and to demonstrate Benford's Law.  Given a collection of naturally occurring numbers -  for example, the length of rivers around the world, enrollment at UAA over the years, the number of votes for a candidate by precinct - you might expect these numbers to have nothing in common. 

But what if you looked at the distribution of the leading digit of these numbers?  In other words, count the number of times 0 is the leading digit, 1 is the leading digit, 2 is the leading digit, etc.  A natural assumption is that all of these digits are equally likely so there should be about a 10% chance to see any one value from 0-9 as the leading digit.

What we find instead is quite remarkable - 1 is the leading digit about 30% of the time, and the probability drops to around 5% for the digit 9.  This distribution holds for all of these examples despite the disparate sources.  One application of this phenomenon is verification of voting records - if the number of votes by precinct doesn't match the expected distribution then that raises questions about vote tampering.


In this exercise your task is to complete a program that uses an array to compute the percentage 0-9 appears as the first digit from the following data sources:

  1. enrollments.txt  - Enrollment by course section at UAA in Spring 2010.  The original data is here.
  2. livejournal.txt - Number of new posts per day at LiveJournal.com.  Data from here.
  3. internethosts.txt - Number of hosts on the internet since 1981.  Data from here.  (Small sample size).

Start with this Visual Studio project, BenfordsLaw.zip, which already does the job of loading in the file and counting the number of times each digit appears as the first digit in the array digitCount:

  digitCount[0] - Number of times 0 is the leading digit
  digitCount[1] - Number of times 1 is the leading digit
     ...
  digitCount[9] - Number of times 9 is the leading digit

The variable total counts up how many numbers were read in.

Your job is to compute the percentage each digit appears and store it in the array yValues. For example, if digitCount[0] is 30 and total is 100 then yValues[0] should contain 0.3.  If digitCount[1] is 20 and total is 100 then yValues[1] should contain 0.2.

Once the yValues array is set up then the program will draw a bar graph of the values.

Here is a completed program:  BenfordComplete.zip