Maxent Model - Example

let's consider a discrete random variable 𝐶 with 2 outcomes: ℎ and 𝑡

below is the formula for univariate entropy, in which we want to maximize 𝐻_𝐏(𝐏) with respect to the constraints of the model

below are 3 different models

Model With No Constraints	Model With 1 Constraint	Model With 2 Constraints
NONE here 𝐏(𝐶) is allowed to be an un-normalized distribution i.e. 𝐏(𝐶) does not have to be a probability distribution	𝐏(𝐶=ℎ) + 𝐏(𝐶=𝑡) = 1 this constrains 𝐏(𝐶) to be a normalized distribution i.e. 𝐏(𝐶) is a probability distribution	𝐏(𝐶=ℎ) + 𝐏(𝐶=𝑡) = 1 𝐏(𝐶=ℎ) = 0.3
thus there is a 2D plane of possible candidates	thus there is a 1D line of possible candidates	thus there is a single 1D point as the possible candidate
𝐻_𝐏(𝐏) is maximized when: 𝐏(𝐶=ℎ) = 1/𝑒 𝐏(𝐶=𝑡) = 1/𝑒 this is because the max of -𝐏(𝐶=𝑥)𝑙𝑛𝐏(𝐶=𝑥) is 1/𝑒	𝐻_𝐏(𝐏) is maximized when: 𝐏(𝐶=ℎ) = 1/2 𝐏(𝐶=𝑡) = 1/2	𝐻_𝐏(𝐏) is maximized when: 𝐏(𝐶=ℎ) = 0.3 𝐏(𝐶=𝑡) = 0.7 which is the only candidate point

Why Find Maximum Entropy Model?

maximizing entropy in effect helps us find an estimated distribution model 𝐏ˆ that:

minimizes commitment (which is another way of saying maximizes entropy)
resembles some reference to the true population distribution (actually empirical distribution)

this is what we want in the estimated distribution model 𝐏ˆ

is to maximize entropy 𝐻, subject to feature-based constraints:

adding constraints/features:

maximum entropy models are convex

a model 𝐹 is convex when:

convexity guarantees a single, global maximum because any higher points are greedily reachable

maximum entropy models 𝐻_𝐏(𝐏) = 𝛴_𝑥∊𝐶[ - 𝐏(𝐶=𝑥) 𝑙𝑛 𝐏(𝐶=𝑥) ] are convex

the Maximum Likelihood Estimation (MLE) exponential model formulation is also convex (dual)