Probability and Integration, I

The Java applet above simulates what might happen when a person throws darts at a dart board. Notice that most of the darts tend to land fairly close to the bull's eye but that some of them land fairly far away. Throwing a dart at a dart board is a complicated affair involving many different muscles. It is difficult to predict what any one throw will do but we can make predictions about many throws -- for example, we can't say with any degree of certainty whether any particular throw will land in the center circle but we might be able to say that out of 1,000 throws between 10 and 20 percent will likely land in the center circle.

The remainder of this chapter is about questions and statements involving probability -- situations like throwing darts at a dart board that involve an element of chance. We will answer questions like --

One way to investigate questions like this is experimentally -- by throwing a large number of darts at a dart board. For example, suppose that you wanted to know how likely you are to hit the center circle. You could buy a new dart board and throw 1,000 darts at the board. Then you could count the number of dart holes in the center circle. If there were 200 dart holes in the center circle, you could say that roughly 20% of the time your throw will land in the center circle. Of course, with all that practice you might improve.

Another way to answer questions involving probability is by computer-based simulation -- like the Java applet above that simulates 500 throws of a dart in just a few seconds. Computer-based simulations are fast but they are only simulations and the results you obtain are only useful for making predictions about real-world problems when the underlying computer model is a good representation of reality. Your computer algebra window has a program that will simulate 1,000 flips of a fair coin -- one that is equally likely to come up "heads" or "tails." When you run this simulation you should get roughly 500 heads. This is a good prediction for what happens with a fair coin but it absolutely useless for what happens with a biased coin

In this module we will study another way to answer questions involving probability -- mathematical analysis of mathematical models involving chance. These methods are very important but they are subject to the same caveats as computer-based simulations. They are only useful for making predictions about real world problems when the mathematical models are good representations of reality..

Basic Probability Theory

We begin by discussing probabilities in the simplest possible case -- when we are dealing with a situation, like tossing a coin, when there are only a finite number of possible results or outcomes. When you flip a coin once there are two possible outcomes -- heads and tails. The set, W, of possible outcomes is called the probability space -- for example,

W = {heads, tails}

Probability theory is most useful for investigating experiments or trials, like flipping a coin, that involve an element of chance and that can be repeated many times. The set, W, includes all the possible results for each trial. Although we cannot make predictions about the result of any one trial because there is an element of chance, we can make predictions about the results of many trials.

In this section the probability space, W, will be a finite set

W = {w1, w2, ... wn}

In this situation each possible outcome, wi, has an associated probability, pi, a number between zero and one. If pi is zero then the outcome, wi, never occurs. If pi is one then the outcome, wi, always occurs. Usually 0 < pi < 1 and we expect that the outcome wi will occur roughly pi * K times if we do K trials. For example if you flip a fair coin each of the two possible outcomes has probability 1/2 and if you flip such a coin 1,000 times you would expect roughly 500 heads and 500 tails.

Because the set W includes all possible outcomes, the probabilities sum to one -- that is,

p1 + p2 + ... + pn = 1

Example

Consider the trial -- flipping a fair coin three times. There are eight possible outcomes:

heads heads heads abbreviated HHH
heads heads tails
abbreviated HHT
heads tails heads
abbreviated HTH
heads tails tails
abbreviated HTT
tails heads heads
abbreviated THH
tails heads tails
abbreviated THT
tails tails heads
abbreviated TTH
tails tails tails
abbreviated TTT

Because the coin is fair, each of these outcomes is equally probable. Since there are eight possible outcomes and they must sum to one, the probability associated with each outcome is 1/8.

An event is a subset of W. For example, if you flip a coin three times the event of getting at least two tails is the subset

E = {HTT, THT, TTH, TTT}

The probability of an event is the sum of the probabilities of the outcomes in the event. We denote this P(E). In the example above P(E) = 1/2.

Notice that if A and B are disjoint events (that is, the sets A and B have no outcomes in common) then

Missing equation


  1. Suppose that you toss a fair coin four times. What is the probability space? What is the probability of each outcome? What is the probability of getting zero heads? exactly one head? exactly two heads? exactly three heads? all heads?

  2. Suppose that you toss a pair of fair dice. Even though dice are usually white with black spots it is easiest to think of one die as being white with black spots and the other as being black with white spots. We denote each possible outcome by (a, b) where a is the number of dots showing on the face up side of the white die and b is the number of dots showing on the face up side of the black die.

    What is the probability space? What is the probability of each possible outcome? What is the probability that the value of a toss is two? three? four? ... twelve?


Probability on R

We now turn to our first infinite probability space -- our first situation in which the set of outcomes is infinite. Suppose you are throwing darts at a one-dimensional dart board -- a set of the form [a, b].

In this situation it no longer makes sense to talk about the probability of hitting a single point x for experimental reasons. One way to estimate the probability of hitting a point, x, would be to run an experiment in which you threw a very large number, n, of darts at the dart board, counted the number of times you hit the point x, and then divided that number by n. If n were very large you would expect to get a good estimate of the probability of hitting x.


                                       number of hits
The probability of hitting x is about ----------------
                                             n

If the dart really was infinitely sharp -- so that you hit only one point with each shot these experiments would usually give zero as the result. Even though you might possibly hit x once in a great while, the number of hits divided by the number of shots would be low. Experimentally, if you have to"call your shot" in advance, the chance of making it is, for all practical purposes, zero.

Even though the probability of hitting any single point is zero we can make statements like the following.

Example

Suppose the probability space is the interval [a, b] and that darts are thrown in a completely unplanned, random way, so that each point is equally likely to be hit. This situation is called a uniform probability because all the points are similar from the probabilistic perspective. In this situation the probability of hitting a subinterval [c, d] is proportional to its length -- that is,


             d - c       1
P([c, d]) = ------- = ------- (d - c)
             b - a     b - a

For example, if you throw a dart at the dart board [0, 3] the probability of hitting the interval [1, 2] is one-third.

The equation above is the key to working with nonuniform probabilities. If the probability is uniform then we can determine the probability of hitting an interval by simple multiplication but if the probability is not uniform then simple multiplication must be replaced by integration.

Definition

A probability density function phi(x) on the real line is a function such that:

Given a probability density function the probability of hitting an interval [a, b] is

Missing equation

The second property above represents mathematically the idea that the only possible outcomes are in the set, R, of all real numbers, so the probability of this set is one.

Example

The most important probability density function is the normal probability density function

Missing equation

shown in the graph at the right. This probability density function is the kind of probability density function that describes situations like throwing a dart at a one-dimensional dartboard with the bull's eye located at zero. Notice the following.
Missing graph

The book Multivariable Calculus, Linear Algebra, and Differential Equations in a Real and Complex World has a proof that

Missing equation

The proof is also available as an Adobe Acrobat PDF file by clicking here. This file may not display well on your monitor. If not, try printing it. If you do not have the Adobe Acrobat Reader installed you may obtain it free by clicking on the Get Acrobat Reader button above. You may also want to verify this theorem in your CAS window.


Find each of the following probabilities using the normal probability density function above.

  1. P([0, 1])

  2. P([-1, 0])

  3. P([-1, 1])

  4. P{[-2, 2])

  5. P([-3, 3])

  6. P([-10, 10])

  7. P({0, +infinity))


In practice the probability density function depends on the skill of the person throwing the darts. The movie at the right shows a series of probability density functions for different skill levels.

The function drawn in black is the original normal probability density function. The functions drawn in red are probability density functions for more skillful dart-throwers. Notice that these functions are higher near the bull's eye and lower further away from the bull's eye than the original normal probability density function because a more skillful dart-thrower is more likely to hit closer to the bull's eye. The blue curves are probability density functiosn for less skillful dart-throwers and, thus, are lower closer to the bull's eye and higher further away from the bull's eye than the original normal probability density function.

Missing graph

The probability density functions in the graph above are all of the form

Missing equation

These functions are all called normal probability density functions. This gives us a family of normal probabilities defined by

Missing equation


  1. Compute P1([-1, 1]), P2([-1, 1]), P3([-1, 1]), and P4([-1, 1]).

  2. Prove that Psigma([a, b]) = P1([a / sigma, b / sigma])

  3. Find the median distance from the bull's eye for a dart thrower whose skill is given by the normal probability density function phi1(x).


The Relative Probability of Hitting a Point

Even though the probability of hitting any one particular point is zero, we do have an intuitive feeling that some points -- points near the bull's eye -- are more likely to be hit than other points -- points farther away from the bull's eye. In fact, you might have an intuitive expectation that the values of a probability density function, phi, determine the relative probabilities of hitting various points.

We might approximate the probability of hitting a point a as compared to the probability of hitting a point b by comparing the probabilities of hitting close to the two points by


P([a - h, a + h])
-----------------
P([b - h, b + h])

If h is small then this should give us a good estimate of the relative probability of hitting a as compared to hitting b. Thus, the exact relative probability should be


          P([a - h, a + h])
  Lim    -------------------
h --> 0   P([b - h, b + h])

The key to computing this limit is the following lemma proved in Multivariable Calculus, Linear Algebra, and Differential Equations in a Real and Complex World.

The proof is also available as an Adobe Acrobat PDF file by clicking here. This file may not display well on your monitor. If not, try printing it. If you have not installed the Adobe Acrobat Reader you may obtain it free by clicking on the Get Acrobat Reader button above.

Lemma

If f(x) is continuous at the point a then

missing equation


Prove that


          P([a - h, a + h])     f(a)
  Lim    ------------------- = ------
h --> 0   P([b - h, b + h])     f(b)

and, thus, that the values of the probability density function phi determine the relative probabilties of hitting different points.


Copyright c 1997 by Frank Wattenberg, Department of Mathematics, Montana State University, Bozeman, MT 59717