<div id="definition" class="section level1">
<h1>Definition</h1>
<p>Assume we observe <span class="math inline">\(X_1,\ldots,X_n\)</span> and that each <span class="math inline">\(X_i\)</span> is sampled from one of <span class="math inline">\(K\)</span> <strong>mixture components</strong>. In the second example above, the mixture components were <span class="math inline">\(\{\text{male,female}\}\)</span>. Associated with each random variable <span class="math inline">\(X_i\)</span> is a label <span class="math inline">\(Z_i \in \{1,\ldots,K\}\)</span> which indicates which component <span class="math inline">\(X_i\)</span> came from. In our height example, <span class="math inline">\(Z_i\)</span> would be either <span class="math inline">\(1\)</span> or <span class="math inline">\(2\)</span> depending on whether <span class="math inline">\(X_i\)</span> was a male or female height. Often times we don’t observe <span class="math inline">\(Z_i\)</span> (e.g. we might just obtain a list of heights with no gender information), so the <span class="math inline">\(Z_i\)</span>’s are sometimes called <strong>latent variables</strong>.</p>
<p>From the law of total probability, we know that the marginal probability of <span class="math inline">\(X_i\)</span> is: <span class="math display">\[P(X_i = x) = \sum_{k=1}^K P(X_i=x|Z_i=k)\underbrace{P(Z_i=k)}_{\pi_k} = \sum_{k=1}^K P(X_i=x|Z_i=k)\pi_k\]</span></p>
<p>Here, the <span class="math inline">\(\pi_k\)</span> are called <strong>mixture proportions</strong> or <strong>mixture weights</strong> and they represent the probability that <span class="math inline">\(X_i\)</span> belongs to the <span class="math inline">\(k\)</span>-th mixture component. The mixture proportions are nonnegative and they sum to one, <span class="math inline">\(\sum_{k=1}^K \pi_k = 1\)</span>. We call <span class="math inline">\(P(X_i|Z_i=k)\)</span> the <strong>mixture component</strong>, and it represents the distribution of <span class="math inline">\(X_i\)</span> assuming it came from component <span class="math inline">\(k\)</span>. The mixture components in our examples above were normal distributions.</p>
<p>For discrete random variables these mixture components can be any probability mass function <span class="math inline">\(p(. \mid Z_{k})\)</span> and for continuous random variables they can be any probability density function <span class="math inline">\(f(. \mid Z_{k})\)</span>. The corresponding pmf and pdf for the mixture model is therefore:</p>
<p><span class="math display">\[p(x) = \sum_{k=1}^{K}\pi_k p(x \mid Z_{k})\]</span> <span class="math display">\[f_{x}(x) = \sum_{k=1}^{K}\pi_k f_{x \mid Z_{k}}(x \mid Z_{k}) \]</span></p>
<p>If we observe independent samples <span class="math inline">\(X_1,\ldots,X_n\)</span> from this mixture, with mixture proportion vector <span class="math inline">\(\pi=(\pi_1, \pi_2,\ldots,\pi_K)\)</span>, then the likelihood function is: <span class="math display">\[L(\pi) = \prod_{i=1}^n P(X_i|\pi) = \prod_{i=1}^n\sum_{k=1}^K P(X_i|Z_i=k)\pi_k\]</span></p>
<p>Now assume we are in the Gaussian mixture model setting where the <span class="math inline">\(k\)</span>-th component is <span class="math inline">\(N(\mu_k, \sigma_k)\)</span> and the mixture proportions are <span class="math inline">\(\pi_k\)</span>. A natural next question to ask is how to estimate the parameters <span class="math inline">\(\{\mu_k,\sigma_k,\pi_k\}\)</span> from our observations <span class="math inline">\(X_1,\ldots,X_n\)</span>. We illustrate one approach in the <a href="intro_to_em.html">introduction to EM</a> vignette.</p>
<p><strong>Acknowledgement:</strong> The “Examples” section above was taken from lecture notes written by Ramesh Sridharan.</p>
<br> <br>
No More
No More
Statlearner
Statlearner