Hidden Markov Models (HMM)

Hidden Markov Model (HMM)

reading prerequisite: Markov Chains
HMM is an extension of Markov Chains where the state-values are NOT fully observable
HMMs are also considered as:
- a type of probabilistic graphical model
- an extension of Naive Bayes Models that model sequences
- the simplest Dynamic Bayesian Network
- a sequential supervised, generative model

HMM - Intuition & Components

HMM is an extension of Markov Chains where the states are NOT fully observable. In other words, each state-value is decomposed into 2 parts:
- 1 hidden-state 𝑌 - each hidden-state can have 1 of 𝑛 possible hidden-state-values {𝑠₁, ..., 𝑠_𝑛}
- 1 observable-state 𝑋 - each observable-state can have 1 of 𝑚 possible observable-state-values {𝑥₁, ..., 𝑥_𝑚}
at current time 𝑡 the HMM is at some hidden-state-value 𝑠^(𝑡). This 𝑠^(𝑡) is 1 of the 𝑛 possible hidden-state-values {𝑠₁, ..., 𝑠_𝑛}
- 𝑠^(𝑡)= 𝑠_𝑖 # where 𝑠_𝑖∊{𝑠₁, ..., 𝑠_𝑛}
this is denoted as 𝑌^(𝑡)=𝑠^(𝑡)=𝑠_𝑖# these 2 ways to represent the state-value of 𝑌^(𝑡) will be useful as we go on
as the HMM moves to time 𝑡+1, it RANDOMLY chooses the next hidden-state-value 𝑠^(𝑡+1)denoted as:
- 𝑌^(𝑡+1)=𝑠^(𝑡+1)=𝑠_?
based on the chosen 𝑠^(𝑡+1) the HMM also RANDOMLY chooses an observable-state-value 𝑥^(𝑡+1)denoted as:
- 𝑋^(𝑡+1)=𝑥^(𝑡+1)=𝑥_?
naturally this forms a sequence of (𝑋,𝑌) pairs as time progresses:
- { ..., (𝑋^(𝑡-1),𝑌^(𝑡-1)), (𝑋^(𝑡),𝑌^(𝑡)), ...}
the sequence of (𝑋,𝑌) pairs is the "recorded history" of state-values over time. However the only values we observe are the 𝑋s not 𝑌s
this sequence can grow infinitely long
the 𝑌s and 𝑋s are called time-variables, where:
- each 𝑌 holds 1 hidden-state-value 𝑠_?
- each 𝑋 holds 1 observable-state-value 𝑥_?

HMM - Required Components (State Transition Matrix & Emission Matrix)

Click here to expand...

at each time-step the RANDOM choice of next hidden-state-value 𝑠^(𝑡+1)is characterized by a 𝑛x𝑛 transition matrix 𝑇 where:

each entry 𝑇[𝑖,𝑗]:
- contains a value between 0 and 1 (inclusive)
- denotes the probability of transitioning from state 𝑠_𝑖to 𝑠_𝑗# i.e. 𝐏(𝑌^(𝑡+1)=𝑠^(𝑡+1)=𝑠_𝑗|𝑌^(𝑡)=𝑠^(𝑡)=𝑠_𝑖)
each row of entries in 𝑇 must sum up to 1

at each time-step the RANDOM choice of observable-state-value 𝑥^(𝑡+1) is characterized by a 𝑛x𝑚 emission matrix 𝐸 where:

each entry 𝐸[𝑗,𝑘]:
- contains a value between 0 and 1 (inclusive)
- denotes the probability of observing value 𝑥_𝑘 when on state 𝑠_𝑗 # i.e. 𝐏(𝑋^(t+1)=𝑥^(t+1)=𝑥_𝑘|𝑌^(t+1)=𝑠^(𝑡+1)=𝑠_𝑗)

so far we have described a HMM that models a Markov Process with 1^st-Order Markov Assumption. Just like Markov Chains, HMMs can be extended to model a:

Markov Processes with 2^nd-Order Markov Assumptions
Markov Processes with Higher-Order Markov Assumptions

HMM - Optional Components (Starting State Probabilities & Ending State Probabilities)

Click here to expand...

if an application has a sequence with a definite START state 𝑌⁽¹⁾=𝑌^{(𝑠𝑡𝑎𝑟𝑡)}, it's probability distribution CANNOT be modeled with a transition matrix. This is because there is no previous state. We solve this by adding another component called Starting State Probabilities, which is characterized by a discrete probability distribution of 𝑛 discrete values (one for each hidden-state-value 𝑠_𝑗 {𝑠₁, ..., 𝑠_𝑛}):

𝐏(𝑌⁽¹⁾=𝑠⁽¹⁾=𝑠_𝑗) equals some probability # for 𝑗 = 1 to 𝑛

if an application has a sequence with a definite END state 𝑌^(𝑇)=𝑌^{(𝑒𝑛𝑑)}, we want to know the probability of a sequence ENDING at a particular state. This probability distribution CANNOT be modeled with a transition matrix because the matrix assumes the sequence continues for eternity. We need a separate component called Ending State Probabilities, which is characterized by a discrete probability distribution of 𝑛 discrete values (one for each hidden-state-value 𝑠_𝑗 {𝑠₁, ..., 𝑠_𝑛}):

𝐏(𝑌^(𝑇)=𝑠^(𝑇)=𝑠_𝑗) equals some probability # for 𝑗 = 1 to 𝑛

HMM - Visual/Graphical Representations

Click here to expand...

HMMs can be graphically represented in 3 ways:

finite graphical representation of the transition matrix 𝑇 (nodes as hidden-state-values & observable-state-values)
finite graphical representation of a sequence of time-variables 𝑋's (nodes as time-variables)
infinitely growing graphical representation of a sequence of time-variables 𝑋's (nodes as time-variables)

HMM - Visual/Graphical Representations - Examples

HMM - Training/Learning

given an observation sequence or set of observation sequences, learn the components of the 𝐻𝑀𝑀
training a 𝐻𝑀𝑀 means to learn the following probabilities:
- state transition matrix/function: 𝐏_{𝑡𝑟𝑎𝑛𝑠}(𝑌^(𝑡)=𝑠_?|𝑌^(𝑡-1)=𝑠_?)
- emission matrix/function: 𝐏_{𝑒𝑚𝑖𝑠𝑠}(𝑋=𝑥_?|𝑌=𝑠_?)
- starting state probabilities: 𝐏_{𝑠𝑡𝑎𝑟𝑡}(𝑌⁽¹⁾=𝑠_?) # optional
- ending state probabilities: 𝐏_𝑒𝑛𝑑(𝑌^(𝑇)=𝑠_?) # optional
methods in learning the probabilities:
- HMM - Learning - Maximum Likelihood Estimation (MLE)
- HMM - Learning - Baum-Welch Algorithm

HMM - Inference Problems

all inference problems can be solved via HMM - Inference By Enumeration & Variable Elimination

Likelihood Problem 1	given a trained 𝐻𝑀𝑀, an observation sequence {𝑥⁽¹⁾, ..., 𝑥^(𝑇)} and the corresponding hidden state sequence {𝑠⁽¹⁾, ..., 𝑠^(𝑇)}, determine likelihood of (conditional probability): 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇)\|𝑠⁽¹⁾, ..., 𝑠^(𝑇)) = 𝛱_{1≤𝑖≤𝑇}[𝐏(𝑋=𝑥^(𝑖)\|𝑌=𝑠^(𝑖))] time-complexity = 𝑂(𝑇)
Likelihood Problem 2	given a trained 𝐻𝑀𝑀, an observation sequence {𝑥⁽¹⁾, ..., 𝑥^(𝑇)} and the corresponding hidden state sequence {𝑠⁽¹⁾, ..., 𝑠^(𝑇)}, determine likelihood of (joint probability): 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇),𝑠⁽¹⁾, ..., 𝑠^(𝑇)) = 𝐏(𝑌=𝑠⁽¹⁾)𝐏(𝑋=𝑥⁽¹⁾\|𝑌=𝑠⁽¹⁾)·𝛱_{2≤𝑖≤𝑇}[𝐏(𝑋=𝑥^(𝑖)\|𝑌=𝑠^(𝑖))𝐏(𝑌=𝑠^(𝑖)\|𝑌=𝑠^(𝑖-1))]·𝐏(𝑌=𝑠^(𝑇)) multiplying by 𝐏(𝑌=𝑠^(𝑇)) is optional time-complexity = 𝑂(𝑇) # however, each of the 𝑇 steps are a constant larger than problem 1
Likelihood Problem 3	given a trained 𝐻𝑀𝑀 and an observation sequence {𝑥⁽¹⁾, ..., 𝑥^(𝑇)}, determine likelihood of: 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇)) = 𝛴_{𝑠⁽¹⁾∊𝑌} 𝛴_{𝑠⁽²⁾∊𝑌} ... 𝛴_{𝑠^(𝑇)∊𝑌} 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇),𝑠⁽¹⁾, ..., 𝑠^(𝑇)) 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇)) = 𝛴_{𝑠⁽¹⁾∊𝑌} 𝛴_{𝑠⁽²⁾∊𝑌} ... 𝛴_{𝑠^(𝑇)∊𝑌} [𝐏(𝑌=𝑠⁽¹⁾)𝐏(𝑋=𝑥⁽¹⁾\|𝑌=𝑠⁽¹⁾)·𝛱_{2≤𝑖≤𝑇}[𝐏(𝑋=𝑥^(𝑖)\|𝑌=𝑠^(𝑖))𝐏(𝑌=𝑠^(𝑖)\|𝑌=𝑠^(𝑖-1))]·𝐏(𝑌=𝑠^(𝑇))] multiplying by 𝐏(𝑌=𝑠^(𝑇)) is optional naive enumeration's time-complexity = 𝑂(𝑇𝑛^𝑇) variable elimination's time-complexity = 𝑂(𝑇𝑛²) # see BE/VE Algorithm Complexity specialized methods in determining likelihood: Forward-Backward Algorithm will compute the probability that an HMM produced a given sequence of observations
Likelihood Problem 4	given a trained 𝐻𝑀𝑀, an observation sequence {𝑥⁽¹⁾, ..., 𝑥^(𝑇)} and the corresponding hidden state sequence {𝑠⁽¹⁾, ..., 𝑠^(𝑇)}, determine likelihood of (conditional probability): 𝐏(𝑠⁽¹⁾, ..., 𝑠^(𝑇)\|𝑥⁽¹⁾, ..., 𝑥^(𝑇)) = 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇),𝑠⁽¹⁾, ..., 𝑠^(𝑇)) / 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇)) Likelihood Problem 4 can be decomposed into 2 problems: Likelihood Problem 2 Likelihood Problem 3
Decoding Problem	given a trained 𝐻𝑀𝑀 and an observation sequence {𝑥⁽¹⁾, ..., 𝑥^(𝑇)}, determine the most likely hidden-state sequence {𝑠⁽¹⁾, ..., 𝑠^(𝑇)} {𝑠⁽¹⁾, ..., 𝑠^(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠⁽¹⁾∊𝑌} ... 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠^(𝑇)∊𝑌} 𝐏(𝑠⁽¹⁾, ..., 𝑠^(𝑇)\|𝑥⁽¹⁾, ..., 𝑥^(𝑇)) # Likelihood Problem 4 {𝑠⁽¹⁾, ..., 𝑠^(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠⁽¹⁾∊𝑌} ... 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠^(𝑇)∊𝑌} 𝐏(𝑠⁽¹⁾, ..., 𝑠^(𝑇),𝑥⁽¹⁾, ..., 𝑥^(𝑇)) / 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇)) {𝑠⁽¹⁾, ..., 𝑠^(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠⁽¹⁾∊𝑌} ... 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠^(𝑇)∊𝑌} 𝐏(𝑠⁽¹⁾, ..., 𝑠^(𝑇),𝑥⁽¹⁾, ..., 𝑥^(𝑇)) # 𝐏(𝑥⁽¹⁾, ..., 𝑥^(𝑇)) is a constant {𝑠⁽¹⁾, ..., 𝑠^(𝑇)} = 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠⁽¹⁾∊𝑌} ... 𝑎𝑟𝑔𝑚𝑎𝑥_{𝑠^(𝑇)∊𝑌} [𝐏(𝑌=𝑠⁽¹⁾)𝐏(𝑋=𝑥⁽¹⁾\|𝑌=𝑠⁽¹⁾) · 𝛱_{2≤𝑖≤𝑇}[𝐏(𝑋=𝑥^(𝑖)\|𝑌=𝑠^(𝑖))𝐏(𝑌=𝑠^(𝑖)\|𝑌=𝑠^(𝑖-1))] · 𝐏(𝑌=𝑠^(𝑇))] # multiplying by 𝐏(𝑌=𝑠^(𝑇)) is optional naive enumeration's time-complexity = 𝑂(𝑇𝑛^𝑇) variable elimination's time-complexity = 𝑂(𝑇𝑛²) # see BE/VE Algorithm Complexity specialized methods in decoding hidden-state sequence: Viterbi Algorithm will compute the sequence of hidden-state-values that best explains a given sequence of observations

Hidden Markov Models (HMM)

Hidden Markov Model (HMM)

HMM - Intuition & Components

HMM - Required Components (State Transition Matrix & Emission Matrix)

HMM - Optional Components (Starting State Probabilities & Ending State Probabilities)

HMM - Visual/Graphical Representations

HMM - Visual/Graphical Representations - Examples

HMM - Training/Learning

HMM - Inference Problems

HMM - Subpages