HMMs Predict Ramiz’ Mood

For simplicity, we will designate food with numbers.

  • Pizza = 3
  • Burger = 2
  • Tacos = 1

We will designate the hidden states (moods) with letters

  • Calm = A
  • Angry = B

Load the environment

library(tidyverse)
library(HMM)
set.seed(123)

First, lets take a look at the food Ramiz ate last month

data$Visible
##  [1] 1 2 2 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2 3 2 3 3 2 1 3 2 3

First lets try to generate what we think would be a good Hidden Markov Model for Ramiz’ mood and diet.

We can assume both of his moods are equally likely. This translates as equal priors.

priors = c(
  A=0.5,
  B=0.5
)

We can assume that if he is angry today, he is equally likely to be angry or calm tomorrow. Conversely, if he is calm today, he is equally likely to be angry or calm tomorrow. This would mean he has equal transition probabilities.

# transition probabilities
transition.probabilities = matrix(
  nrow = 2,
  ncol = 2,
  data=rep(0.5, 4)
)
colnames(transition.probabilities) = c('A', 'B')
rownames(transition.probabilities) = c('A', 'B')

Now we have to define the emission probabilities. These are the probabilities that Ramiz will eat a certain type of food for dinner given his mood.

From our time being Ramiz’ friend, we have observed that he is more likely to eat pizza than burgers, and more likely to eat bugers than tacos. We also think he is slightly more likely to eat tacos when he is angry.

# emission probabilities
emission.probabilities = data.frame(
  A = c(
    0.111,
    0.333,
    0.555
  ),
  B = c(
    0.166,
    0.333,
    0.500
  )
)

Finally, let’s save Ramiz’ previous 31 dinner in a vector

# visible/observed data
visible=data$Visible

As we can see, from data generated from the same Hidden Markov Model that is used as input to both implementations of the Viterbi algorithm, the R-Package and our implementation returns pretty similar results.

Finally, the inputs (priors, transitions, emissions) can be easily estimated from data

Example getting inputs from data

# get an example of training data from simulated data
training=data.frame(
  Hidden = simulated.hmm$states,
  Visible = simulated.hmm$observation
)
head(training)
##   Hidden Visible
## 1      B       1
## 2      A       3
## 3      A       2
## 4      A       3
## 5      B       2
## 6      B       3
# function to get probabilities from data
get.inputs.from.data = function(
  training = training
)
{
  # get priors 
  priors=table(training$Hidden)/dim(training)[1]
  priors
  
  # get transitions
  all.transitions=c()
  for (i in 1:(dim(training)[1]-1) ) 
  {
    # get both states
    first.state=training$Hidden[i]
    second.state=training$Hidden[i+1]
    
    # get pasted
    combination=paste(
      first.state,
      second.state,
      sep = '>'
    )
    all.transitions[i]=combination
  }
  transition.probabilities = table(all.transitions)/length(all.transitions)
  transition.probabilities
  
  # get transition probability matrix
  transition.probabilities  = matrix(
    data = transition.probabilities,
    nrow = 2,
    ncol = 2,
    byrow = TRUE
  )
  colnames(transition.probabilities)=c('A','B')
  rownames(transition.probabilities)=c('A','B')
  
  # emission probabilities
  emission.probabilities = table(training)/dim(training)[1]
  emission.probabilities
  
  # emission probabilities in correct format
  emission.probabilities = data.frame(
    A = emission.probabilities['A',],
    B = emission.probabilities['B',]
  )
  return.list=list(
    'emission.probabilities'=emission.probabilities,
    'transition.probabilities'=transition.probabilities,
    'priors'=priors
  )
  return(return.list)
}

# get inputs
inputs = get.inputs.from.data(training = training)

# generate some new data (test data)
simulated.hmm=simHMM(
  hmm = hmm,
  length = 10000
)

# run my viterbi
new.viterbi.tbl = My.Viterbi(
  emission.probabilities = inputs$emission.probabilities,
  transition.probabilities = inputs$transition.probabilities,
  priors = inputs$priors,
  visible = simulated.hmm$observation
)
new.viterbi.path = get.viterbi.path(viterbi.tbl = viterbi.tbl)

# package viterbi
true.viterbi = viterbi(
  hmm = hmm,
  observation = simulated.hmm$observation
)

# compare to truth
mine=sum(new.viterbi.path == simulated.hmm$states)
pack=sum(true.viterbi == simulated.hmm$states)
mine
## [1] 4969
pack
## [1] 5241

As expected, our implementation performed comparable to the R-Package, even though we estimated our probabilities from the data, and did not use the true HMM model as input (as we did for the R-Package). This is a good support that our estimations were good. Of course, the training data and the testing data were generated from the same HMM, which is an assumption that needs to also be true in real data analysis.