Hidden Markov Models (HMM)

Hidden Markov Models (HMM)

Hidden Markov Model (HMM)

HMM - Intuition & Components

  • HMM is an extension of Markov Chains where the states are NOT fully observable. In other words, each state-value is decomposed into 2 parts:
    • hidden-state 𝑌 each hidden-state can have 1 of 𝑛 possible hidden-state-values {𝑠1, ..., 𝑠𝑛}
    • 1 observable-state 𝑋 each observable-state can have 1 of 𝑚 possible observable-state-values {𝑥1, ..., 𝑥𝑚}
  • at current time 𝑡 the HMM is at some hidden-state-value 𝑠(𝑡). This 𝑠(𝑡) is 1 of the 𝑛 possible hidden-state-values {𝑠1, ..., 𝑠𝑛}
    • 𝑠(𝑡) = 𝑠𝑖 # where 𝑠𝑖∊{𝑠1, ..., 𝑠𝑛}
  • this is denoted as 𝑌(𝑡)=𝑠(𝑡)=𝑠𝑖 # these 2 ways to represent the state-value of 𝑌(𝑡) will be useful as we go on
  • as the HMM moves to time 𝑡+1, it RANDOMLY chooses the next hidden-state-value 𝑠(𝑡+1) denoted as:
    • 𝑌(𝑡+1)=𝑠(𝑡+1)=𝑠?
  • based on the chosen 𝑠(𝑡+1) the HMM also RANDOMLY chooses an observable-state-value 𝑥(𝑡+1) denoted as:
    • 𝑋(𝑡+1)=𝑥(𝑡+1)=𝑥?
  • naturally this forms a sequence of (𝑋,𝑌) pairs as time progresses:
    • { ..., (𝑋(𝑡-1),𝑌(𝑡-1)), (𝑋(𝑡),𝑌(𝑡)), ...}
  • the sequence of (𝑋,𝑌) pairs is the "recorded history" of state-values over time. However the only values we observe are the 𝑋s not 𝑌s
  • this sequence can grow infinitely long
  • the 𝑌s and 𝑋s are called time-variables, where:
    • each 𝑌 holds 1 hidden-state-value 𝑠?
    • each 𝑋 holds 1 observable-state-value 𝑥?

HMM - Required Components (State Transition Matrix & Emission Matrix)

 Click here to expand...

at each time-step the RANDOM choice of next hidden-state-value 𝑠(𝑡+1) is characterized by a 𝑛x𝑛 transition matrix 𝑇 where:

  • each entry 𝑇[𝑖,𝑗]:
    • contains a value between 0 and 1 (inclusive)
    • denotes the probability of transitioning from state 𝑠𝑖 to 𝑠𝑗 # i.e. 𝐏(𝑌(𝑡+1)=𝑠(𝑡+1)=𝑠𝑗|𝑌(𝑡)=𝑠(𝑡)=𝑠𝑖)
  • each row of entries in 𝑇 must sum up to 1

at each time-step the RANDOM choice of observable-state-value 𝑥(𝑡+1) is characterized by a 𝑛x𝑚 emission matrix 𝐸 where:

  • each entry 𝐸[𝑗,𝑘]:
    • contains a value between 0 and 1 (inclusive)
    • denotes the probability of observing value 𝑥𝑘 when on state 𝑠𝑗 # i.e. 𝐏(𝑋(t+1)=𝑥(t+1)=𝑥𝑘|𝑌(t+1)=𝑠(𝑡+1)=𝑠𝑗)

so far we have described a HMM that models a Markov Process with 1st-Order Markov Assumption. Just like Markov Chains, HMMs can be extended to model a:

HMM - Optional Components (Starting State Probabilities & Ending State Probabilities)

 Click here to expand...

if an application has a sequence with a definite START state 𝑌(1)=𝑌(𝑠𝑡𝑎𝑟𝑡), it's probability distribution CANNOT be modeled with a transition matrix. This is because there is no previous state. We solve this by adding another component called Starting State Probabilities, which is characterized by a discrete probability distribution of 𝑛 discrete values (one for each hidden-state-value 𝑠𝑗 {𝑠1, ..., 𝑠𝑛}):

  • 𝐏(𝑌(1)=𝑠(1)=𝑠𝑗) equals some probability # for 𝑗 = 1 to 𝑛

if an application has a sequence with a definite END state 𝑌(𝑇)=𝑌(𝑒𝑛𝑑), we want to know the probability of a sequence ENDING at a particular state. This probability distribution CANNOT be modeled with a transition matrix because the matrix assumes the sequence continues for eternity. We need a separate component called Ending State Probabilities, which is characterized by a discrete probability distribution of 𝑛 discrete values (one for each hidden-state-value 𝑠𝑗 {𝑠1, ..., 𝑠𝑛}):

  • 𝐏(𝑌(𝑇)=𝑠(𝑇)=𝑠𝑗) equals some probability # for 𝑗 = 1 to 𝑛

HMM - Visual/Graphical Representations

 Click here to expand...

HMMs can be graphically represented in 3 ways:

  1. finite graphical representation of the transition matrix 𝑇 (nodes as hidden-state-values & observable-state-values)
  2. finite graphical representation of a sequence of time-variables 𝑋's (nodes as time-variables)

  3. infinitely growing graphical representation of a sequence of time-variables 𝑋's (nodes as time-variables)

HMM - Visual/Graphical Representations - Examples

HMM - Training/Learning

  • given an observation sequence or set of observation sequences, learn the components of the 𝐻𝑀𝑀
  • training a 𝐻𝑀𝑀 means to learn the following probabilities:

    • state transition matrix/function: 𝐏𝑡𝑟𝑎𝑛𝑠(𝑌(𝑡)=𝑠?|𝑌(𝑡-1)=𝑠?)
    • emission matrix/function: 𝐏𝑒𝑚𝑖𝑠𝑠(𝑋=𝑥?|𝑌=𝑠?)
    • starting state probabilities: 𝐏𝑠𝑡𝑎𝑟𝑡(𝑌(1)=𝑠?) # optional
    • ending state probabilities: 𝐏𝑒𝑛𝑑(𝑌(𝑇)=𝑠?) # optional
  • methods in learning the probabilities:

HMM - Inference Problems

all inference problems can be solved via HMM - Inference By Enumeration & Variable Elimination

Likelihood Problem 1
  • given a trained 𝐻𝑀𝑀, an observation sequence {𝑥(1), ..., 𝑥(𝑇)} and the corresponding hidden state sequence {𝑠(1), ..., 𝑠(𝑇)}, determine likelihood of (conditional probability):
    • 𝐏(𝑥(1), ..., 𝑥(𝑇)|𝑠(1), ..., 𝑠(𝑇)) = 𝛱1≤𝑖≤𝑇[𝐏(𝑋=𝑥(𝑖)|𝑌=𝑠(𝑖))]
Likelihood Problem 2
  • given a trained 𝐻𝑀𝑀, an observation sequence {𝑥(1), ..., 𝑥(𝑇)} and the corresponding hidden state sequence {𝑠(1), ..., 𝑠(𝑇)}, determine likelihood of (joint probability):
    • 𝐏(𝑥(1), ..., 𝑥(𝑇),𝑠(1), ..., 𝑠(𝑇)) = 𝐏(𝑌=𝑠(1))𝐏(𝑋=𝑥(1)|𝑌=𝑠(1))·𝛱2≤𝑖≤𝑇[𝐏(𝑋=𝑥(𝑖)|𝑌=𝑠(𝑖))𝐏(𝑌=𝑠(𝑖)|𝑌=𝑠(𝑖-1))𝐏(𝑌=𝑠(𝑇))
      • multiplying by 𝐏(𝑌=𝑠(𝑇)) is optional
      • time-complexity = 𝑂(𝑇) # however, each of the 𝑇 steps are a constant larger than problem 1
Likelihood Problem 3
  • given a trained 𝐻𝑀𝑀 and an observation sequence {𝑥(1), ..., 𝑥(𝑇)}, determine likelihood of:
    • 𝐏(𝑥(1), ..., 𝑥(𝑇)) = 𝛴𝑠(1)∊𝑌 𝛴𝑠(2)∊𝑌 ... 𝛴𝑠(𝑇)∊𝑌 𝐏(𝑥(1), ..., 𝑥(𝑇),𝑠(1), ..., 𝑠(𝑇))
    • 𝐏(𝑥(1), ..., 𝑥(𝑇)) = 𝛴𝑠(1)∊𝑌 𝛴𝑠(2)∊𝑌 ... 𝛴𝑠(𝑇)∊𝑌 [𝐏(𝑌=𝑠(1))𝐏(𝑋=𝑥(1)|𝑌=𝑠(1))·𝛱2≤𝑖≤𝑇[𝐏(𝑋=𝑥(𝑖)|𝑌=𝑠(𝑖))𝐏(𝑌=𝑠(𝑖)|𝑌=𝑠(𝑖-1))𝐏(𝑌=𝑠(𝑇))]
  • specialized methods in determining likelihood:
Likelihood Problem 4
  • given a trained 𝐻𝑀𝑀, an observation sequence {𝑥(1), ..., 𝑥(𝑇)} and the corresponding hidden state sequence {𝑠(1), ..., 𝑠(𝑇)}, determine likelihood of (conditional probability):
    • 𝐏(𝑠(1), ..., 𝑠(𝑇)|𝑥(1), ..., 𝑥(𝑇)) = 𝐏(𝑥(1), ..., 𝑥(𝑇),𝑠(1), ..., 𝑠(𝑇)) / 𝐏(𝑥(1), ..., 𝑥(𝑇))
  • Likelihood Problem 4 can be decomposed into 2 problems:
    • Likelihood Problem 2
    • Likelihood Problem 3
Decoding Problem
  • given a trained 𝐻𝑀𝑀 and an observation sequence {𝑥(1), ..., 𝑥(𝑇)}, determine the most likely hidden-state sequence {𝑠(1), ..., 𝑠(𝑇)}
    • {𝑠(1), ..., 𝑠(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(1)∊𝑌 ... 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(𝑇)∊𝑌 𝐏(𝑠(1), ..., 𝑠(𝑇)|𝑥(1), ..., 𝑥(𝑇)) # Likelihood Problem 4

    • {𝑠(1), ..., 𝑠(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(1)∊𝑌 ... 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(𝑇)∊𝑌 𝐏(𝑠(1), ..., 𝑠(𝑇),𝑥(1), ..., 𝑥(𝑇))𝐏(𝑥(1), ..., 𝑥(𝑇))

    • {𝑠(1), ..., 𝑠(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(1)∊𝑌 ... 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(𝑇)∊𝑌 𝐏(𝑠(1), ..., 𝑠(𝑇),𝑥(1), ..., 𝑥(𝑇)) # 𝐏(𝑥(1), ..., 𝑥(𝑇)) is a constant

    • {𝑠(1), ..., 𝑠(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(1)∊𝑌 ... 𝑎𝑟𝑔𝑚𝑎𝑥𝑠(𝑇)∊𝑌 [𝐏(𝑌=𝑠(1))𝐏(𝑋=𝑥(1)|𝑌=𝑠(1)) · 𝛱2≤𝑖≤𝑇[𝐏(𝑋=𝑥(𝑖)|𝑌=𝑠(𝑖))𝐏(𝑌=𝑠(𝑖)|𝑌=𝑠(𝑖-1))] · 𝐏(𝑌=𝑠(𝑇))] # multiplying by 𝐏(𝑌=𝑠(𝑇)) is optional

  • specialized methods in decoding hidden-state sequence:
    • Viterbi Algorithm will compute the sequence of hidden-state-values that best explains a given sequence of observations

HMM - Subpages