One Dimensional MCMC example in R
Before using the data collected by the McDonald’s Manager, let’s try some data where we know the answer to help make sure our MCMC works
These are our true parameters in our simulation
lambda=4
lets generate some data and see if we can determine that the lambda parameter was 4 in our simulated data
# lets assume we'll collect 1000 samples
n=1000
# generate data from a poisson distribution
poisson.counts = rpois(n=n, lambda = lambda)
We also need to calculate the prior probability of a parameter
Lets assume we don’t know anything about what the true lambda is
We just know it’s a positive number
We can assign a prior distribution that gives equal probability to all positive observations
prior.prob = function(prior)
{
if(prior>0)
{
return(1)
}else
{
return(0)
}
}
Great! Now lets write our code for the MCMC
# We'll run the MCMC for 40000 iterations
num.iters=40000
# with an initial value for lambda of (pick whatever you want)
lambda.current=20
# we'll store the lambdas in the MCMC in this vector
all.lambdas=c()
# loop for the number of iterations
for (i in 1:num.iters) {
# get a new proposed lambda sampling
proposal = lambda.current+rnorm(n=1, mean = 0, sd=1)
# calculate the log likelihoods for the current and proposed lambda
lik.current = calc.log.lik(lambda.current)
lik.proposal = calc.log.lik(proposal)
# calculate the prior probabilities for the parameters
prior.current=prior.prob(lambda.current)
prior.proposal=prior.prob(proposal)
# calculate the log acceptance probability
p.accept = (lik.proposal + log(prior.proposal))-(lik.current+log(prior.current))
# accept if the log probability is larger than a random number
rnd.num = log(runif(n=1))
# some control flow for the possible values of p.accept
if(is.infinite(p.accept)) # if value is infinite, move on to the next step
{
# do not accept lambda
# add to list
all.lambdas = c(all.lambdas, lambda.current)
} else if(is.na(p.accept)) # if value is NA, move on to the next step
{
# do not accept lambda
# add to list
all.lambdas = c(all.lambdas, lambda.current)
} else if(p.accept > rnd.num) # if p.accept is larger than random, add to list
{
# accept lambda
lambda.current = proposal
# add to list
all.lambdas = c(all.lambdas, lambda.current)
} else if(p.accept < rnd.num)# if p.accept is not larger than random, do not add to list
{
# do not accept lambda
# add to list
all.lambdas = c(all.lambdas, lambda.current)
}
# let me know if we have moved through a 1000 more iterations
if(i %% 1000 ==0)
{
cat('iteration: ', i)
cat('', sep = '\n')
}
# end of loop
}
## iteration: 1000
## iteration: 2000
## iteration: 3000
## iteration: 4000
## iteration: 5000
## iteration: 6000
## iteration: 7000
## iteration: 8000
## iteration: 9000
## iteration: 10000
## iteration: 11000
## iteration: 12000
## iteration: 13000
## iteration: 14000
## iteration: 15000
## iteration: 16000
## iteration: 17000
## iteration: 18000
## iteration: 19000
## iteration: 20000
## iteration: 21000
## iteration: 22000
## iteration: 23000
## iteration: 24000
## iteration: 25000
## iteration: 26000
## iteration: 27000
## iteration: 28000
## iteration: 29000
## iteration: 30000
## iteration: 31000
## iteration: 32000
## iteration: 33000
## iteration: 34000
## iteration: 35000
## iteration: 36000
## iteration: 37000
## iteration: 38000
## iteration: 39000
## iteration: 40000
get the confidence interval for the estimated value of lambda
quantile(x=tail(all.lambdas, n=num.iters/2), probs = c(0.025, 0.5, 0.975))
## 2.5% 50% 97.5%
## 4.028312 4.151373 4.280802
show me how the lambdas changed from begining to end
par(mfrow=c(1,2))
plot(all.lambdas,
x=1:length(all.lambdas),
col = 'orange',
xlab='iteration',
ylab='lambda value',
ylim = c(-max(all.lambdas),(max(all.lambdas)+ abs(max(lambda))/2) ),
type='l')
hist(tail(all.lambdas, n=num.iters/2),
main='lambdas',
xlab = 'value')

It turns out MCMC does work!
There are better R packages to run MCMC
Lets use the fmcmc package to run MCMC in the data provided by the McDonald’s Manager
Lets make a histogram of the McDonald’s counts
hist(mcdonalds.counts, main = 'Cars in drive through from 4-5pm through 2020')

As before, we still have to define the likelihood function
calc.log.lik = function(lambda)
{
liks=dpois(x = mcdonalds.counts,
lambda = lambda,
log = TRUE)
lik=sum(liks)
return(lik)
}
this is the code to run the MCMC function in fmcmc
# load library
library(fmcmc)
# fmcmc command
ans <-
MCMC(
fun=calc.log.lik,
initial = c(lambda=10),
nsteps = 1e5,
kernel = kernel_normal_reflective(scale = .1, ub = 25, lb = 5),
burnin = 20000
)
# show answer
summary(ans)
##
## Iterations = 20001:1e+05
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 80000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 1.651e+01 2.129e-01 7.527e-04 3.738e-03
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 16.09 16.37 16.51 16.65 16.93
The fmcmc package also allows for easy visualization of the
trace plots and density plots for your estimated parameter
plot(ans)

Since the lambda parameter is significantly larger than 15 (95% CI > 15),
It seems that the McDonald’s Manager has some hiring to do!
Example of MCMC to estimate two parameters
Two basketball players chose their favorite 5 spots from the court (each) and shoot 30 shots from each spot.
This is a distribution of how many shots they made from each spot
hist(all.shots)

We want to determine both of their probabilities of making a shot. We can assume each round of 30 shots follows a binomial distribution.
This would be how to calculate the binomial likelihood.
calc.log.lik=function(theta)
{
all.log.liks=c()
for (i in all.shots)
{
lik.theta1 = dbinom(x=i,
size = shots.each,
prob = theta[1],
log = TRUE)
lik.theta2 = dbinom(x=i,
size = shots.each,
prob = theta[2],
log = TRUE)
if(lik.theta1 > lik.theta2)
{
lik.log=lik.theta1
}else if(lik.theta2 > lik.theta1)
{
lik.log=lik.theta2
}
all.log.liks=c(all.log.liks, lik.log)
}
log.lik.final=sum(all.log.liks)
return(log.lik.final)
}
This is the needed MCMC code using the fmcmc package (Flexible MCMC)
# load library
library(fmcmc)
# thetas initialization
theta=c(runif(1), runif(1))
# fmcmc command
ans <-
MCMC(
fun=calc.log.lik,
initial = theta,
nsteps = 1e5,
kernel = kernel_unif_reflective(min. = -0.01, max. = 0.01, lb = 0, ub = 1),
burnin = 40000
)
# show answer
summary(ans)
##
## Iterations = 40001:1e+05
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 60000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## par1 0.5305 0.04275 0.0001745 0.002750
## par2 0.8885 0.02595 0.0001060 0.001018
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## par1 0.4465 0.5012 0.5313 0.5602 0.6117
## par2 0.8339 0.8716 0.8898 0.9068 0.9352
# make plots
plot(ans)

The true probabilities were 0.5 and 0.85. We can see MCMC did reasonably well! It helps that the players had different probabilities of making a shot. In cases where the shooters probabilitee are closer, we would need to collect more shots from the players and from more areas of the court to help improve our MCMC estimation. Lets show this in another example.
Simulate another example with players with close probability
Since we need more data to estimate the probabilites correctly if they are close to each other, in this case, two basketball players chose their favorite 5 spots from the court (each) and shoot 300 (instead of 30) shots from each spot. Yes, I know, unrealistic, but let’s do the example anyway.
This is a distribution of how many shots they made from each spot
hist(all.shots)

We can keep the same code as before, including the likelihood function
# load library
library(fmcmc)
# thetas initialization
theta=c(runif(1), runif(1))
# fmcmc command
ans <-
MCMC(
fun=calc.log.lik,
initial = theta,
nsteps = 1e5,
kernel = kernel_unif_reflective(min. = -0.01, max. = 0.01, lb = 0, ub = 1),
burnin = 40000
)
# show answer
summary(ans)
##
## Iterations = 40001:1e+05
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 60000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## par1 0.8375 0.01023 4.176e-05 0.0001799
## par2 0.7370 0.01459 5.958e-05 0.0003611
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## par1 0.8180 0.8306 0.8372 0.8441 0.8583
## par2 0.7082 0.7272 0.7369 0.7469 0.7655
# make plots
plot(ans)

Clearly this is an unrealistic scenario. However, the message gets across. MCMC works very well to estimate parameters (including multiple parameter problems), the quality of the data is always most imporant and inspection of the trace and density plots should always be done to ensure a good fit. Other methods of MCMC quality control can be discussed later. This was only meant as a simple introduction into the method.