Using simulations to estimate probabilities

Before completing this section of the lab, you should have worked through the hot hand example.

Let's take this lab one step further, and use our simulations to estimate probabilities. Recall that the relative frequency definition of probability states:

The probability of an event A is the long-run relative frequency (proportion of the time) that the event A occurs if we were to observe the random process an infinite number of times.

This means if we were to use a computer to simulate the random process many many times, we can estimate a probability by calculating the proportion of those times the event A occurred. 

In the hot hand example, you used the sample command to create a simulated sequence of basketball shots. Assuming we have independent shots, where the probability of a "hit" is equal to 0.45, we could estimate the probability of a streak of length one using the commands:

outcomes <- c("H","M")
sim_basket <- sample(outcomes, size = 10000,replace =TRUE,prob = c(0.45, 0.55))
sim_streak <- calc_streak(sim_basket)
sum( sim_streak == 1 )/length(sim_streak)

Describe to your neighbor what each line of code is doing. If you're not sure, break down pieces of the code to look at, e.g.,

head( sim_streak == 1 )

Exercises

  1. Use simulation to estimate the probability of a streak of length two.
  2. Use simulation to estimate the probability that a streak is less than 2. Hint: Use the logical operator for "or": |

Simulation using a "for" loop 

In the previous simulation, the sample function repeated the random process a large number of times by changing the "size" argument. Most of the time, there isn't a built-in R function that will repeat a process for you. Instead, you'll need to use a "for" loop. Here is a basic example:

for(i in 1:10){
print(i)
}

The basic structure of a "for" loop starts with the "for" function. The arguments to the function is a character that represents your index (e.g., i) (this character can be anything), then the word "in" followed by the sequence you'd like your index to work through (which can be any sequence). The R commands the loop will do each iteration are contained between the brackets.

Instead of using the sample command, we could have simulated a sequence of 133 basketball shots using a for loop:

outcomes <- c("H","M")
# Set up vector to keep track of the outcome on each shot
shots <- NULL
for(k in 1:133){
shots[k] <- sample(outcomes, size = 1, prob = c(0.45, 0.55))
}

Let's break down what this for loop does by iterating manually:

shots <- NULL
shots
k <- 1
shots[k] <- sample(outcomes, size = 1, prob = c(0.45, 0.55))
shots
k <- 2
shots[k] <- sample(outcomes, size = 1, prob = c(0.45, 0.55))
shots
k <- 3
shots[k] <- sample(outcomes, size = 1, prob = c(0.45, 0.55))
shots

For more advanced options to for loops in R, see this tutorial.

Exercises

  1. In probability, there is a problem called the "hat problem": Suppose n people go to a fancy restaurant. Each person is wearing a hat and checks his/her hat at the door as he/she arrives. The hat-check attendant gets a little tipsy throughout the evening and returns a random hat to each person as they leave. If the patrons leave in a random order, what is the probability that no one gets his/her own hat back? We can simulate one evening with 20 people as follows:
n <- 20
Hats <- sample(1:n, n)
Heads <- sample(1:n, n)
sum( Hats == Heads )

Use a "for" loop and the above code to estimate the probability that no one gets his/her own hat back.