Latent Dirichlet allocation

模型概述

  • notation

α\alpha is the parameter of the Dirichlet prior on the per-document topic distributions, β\beta is the parameter of the Dirichlet prior on the per-topic word distribution, θm\theta _{m} is the topic distribution for document m, φk\varphi _{k} is the word distribution for topic k, zmn{\displaystyle z_{mn}} is the topic for the n-th word in document m, and wmn{\displaystyle w_{mn}} is the specific word.

Generative process

  1. Choose θiDir(α){\displaystyle \theta _{i}\sim \operatorname {Dir} (\alpha )}, where i{1,,M}i\in \{1,\dots ,M\} and Dir(α)\mathrm {Dir} (\alpha ) is a Dirichlet distribution with a symmetric parameter α\alpha which typically is sparse α<1\alpha < 1

  2. Choose φkDir(β){\displaystyle \varphi _{k}\sim \operatorname {Dir} (\beta )}, where k{1,,K}k\in \{1,\dots ,K\} and β\beta typically is sparse

  3. For each of the word positions i,ji,j, where i{1,,M}i\in \{1,\dots ,M\}, and j{1,,Ni}j\in \{1,\dots ,N_{i}\}

    (a) Choose a topic zi,jMultinomial(θi).{\displaystyle z_{i,j}\sim \operatorname {Multinomial} (\theta _{i}).}

    (b) Choose a word wi,jMultinomial(φzi,j).{\displaystyle w_{i,j}\sim \operatorname {Multinomial} (\varphi _{z_{i,j}}).}

We can then mathematically describe the random variables as follows:

φk=1KDirichletV(β)θd=1MDirichletK(α)zd=1M,w=1NdCategoricalK(θd)wd=1M,w=1NdCategoricalV(φzdw){\displaystyle {\begin{aligned}{\boldsymbol {\varphi }}_{k=1\dots K}&\sim \operatorname {Dirichlet} _{V}({\boldsymbol {\beta }})\\{\boldsymbol {\theta }}_{d=1\dots M}&\sim \operatorname {Dirichlet} _{K}({\boldsymbol {\alpha }})\\z_{d=1\dots M,w=1\dots N_{d}}&\sim \operatorname {Categorical} _{K}({\boldsymbol {\theta }}_{d})\\w_{d=1\dots M,w=1\dots N_{d}}&\sim \operatorname {Categorical} _{V}({\boldsymbol {\varphi }}_{z_{dw}})\end{aligned}}}

模型求解

模型中涉及到的参数有: θ,φ,Z\theta, \varphi, Z, 下面会用 Variational Inference 和 collapsed Gibbs sampling 两种方法进行求解.

Variational Inference

the total probability of the model is:

P(W,Z,θ,φ;α,β)=i=1KP(φi;β)j=1MP(θj;α)t=1NP(Zj,tθj)P(Wj,tφZj,t),{\displaystyle P({\boldsymbol {W}},{\boldsymbol {Z}},{\boldsymbol {\theta }},{\boldsymbol {\varphi }};\alpha ,\beta )=\prod _{i=1}^{K}P(\varphi _{i};\beta )\prod _{j=1}^{M}P(\theta _{j};\alpha )\prod _{t=1}^{N}P(Z_{j,t}\mid \theta _{j})P(W_{j,t}\mid \varphi _{Z_{j,t}}),}

现考虑用如下分解的变分分布: q(Z,θ,φ)=k=1Kq(φk)i=1M{q(θi)jNq(zi,j)}q({\boldsymbol {Z}},{\boldsymbol {\theta }},{\boldsymbol {\varphi }}) = \prod \limits_{k=1}^{K}q(\varphi_k) \prod \limits_{i=1}^{M} \{q(\theta_i) \prod \limits_{j}^{N} q(z_{i,j}) \}P(Z,θ,φW)P({\boldsymbol {Z}},{\boldsymbol {\theta }},{\boldsymbol {\varphi }} | {\boldsymbol {W}}) 做近似.

利用 mean field theory, 可以得到迭代公式.

  • 求解 q(θi)q(\theta_{i})

q(θi)=Dirichlet(α1,α2,,αK)whereαk=α+jq(zi,j=k)\begin{aligned} q(\theta_{i}) & = Dirichlet(\alpha^*_1, \alpha^*_2, \dots, \alpha^*_K) \\ where \quad \alpha^*_k &= \alpha + \sum_{j} q(z_{i,j}=k) \end{aligned}
  • 求解 q(φk)q(\varphi_{k})

q(φk)=Dirichlet(β1,β2,,βV)whereβv=β+i,j1(wi,j=v)q(zi,j=k)\begin{aligned} q(\varphi_{k}) & = Dirichlet(\beta^*_1, \beta^*_2, \dots, \beta^*_V) \\ where \quad \beta^*_v & = \beta + \sum_{i,j} 1(w_{i,j}=v)q(z_{i,j}=k) \end{aligned}
  • 求解 q(zi,j)q(z_{i, j})

q(zi,j=k)exp(ψ(αk)ψ(kαk)+ϕ(βk,wi,j)ϕ(vβk,v))q(z_{i, j}=k) \propto exp(\psi(\alpha^*_k) - \psi(\sum_k \alpha^*_k) + \phi(\beta^*_{k, w_{i,j}}) - \phi(\sum_{v} \beta^*_{k, v}))

collapsed Gibbs sampling

Following is the derivation of the equations for collapsed Gibbs sampling, which means φ\varphis and θ\thetas will be integrated out.

p(zm,n=kZm,n,W;α,β)(αk+nm,()k,(m,n))βwm,n+n(),wm,nk,(m,n)vβv+n(),vk,(m,n)p(z_{m,n}=k | Z_{-m,n}, W; \alpha, \beta) \propto (\alpha_k+n_{m,(\cdot )}^{k, -(m,n)}) \frac {\beta_{w_{m,n}}+n_{(\cdot ),w_{m,n}}^{k, -(m,n)}}{\sum_v \beta_v+n_{(\cdot ),v}^{k, -(m,n)}}

其中, nm,()kn_{m,(\cdot )}^{k} 表示在文档 mm 中, 出现主题 kk 中词的个数, n(),vkn_{(\cdot ),v}^{k} 表示在主题 kk 中出现的词 vv 的次数. 上标中的 (m,n)-(m,n) 表示忽略标号为 (m,n)(m, n) 的词.

  • 推导细节

Δ(α)=Γ(αi)Γ(αi)\Delta(\boldsymbol{\alpha}) = \frac{\prod \Gamma(\alpha_i)}{\Gamma(\sum \alpha_i)}.

p(zm,nz(m,n),W)=p(W,z)p(W,z(m,n))=p(Wz)p(z)p(W(m,n)z(m,n))p(wi)p(z(m,n))p(Wz)p(W(m,n)z(m,n))p(z)p(z(m,n))p(z_{m,n} | z_{-(m,n)}, W) = \frac{p(W, z)} {p(W, z_{-(m,n)})} = \frac{p(W|z)p(z)} {p(W_{-(m,n)}|z_{-(m,n)}) p(w_i) p(z_{-(m,n)})} \propto \frac{p(W|z)}{p(W_{-(m,n)}|z_{-(m,n)})} \frac{p(z)}{p(z_{-(m,n)})}

下面分别求 p(Wz)p(W|z), p(z)p(z) 的表达式.

p(wz;β)=p(wz,φ)p(φβ)dφ=i=1Mj=1Np(wi,jzi,j,φ)k=1Kp(φkβ)dφ=i=1Mj=1Nφzi,j,wi,jk=1K1Δ(β)vφk,vβv1dφ=k=1K1Δ(β)vφk,vβv+n(),vk1dφ=k=1KΔ(β+n()k)Δ(β)wheren()k=(n(),1k,,n(),Vk)\begin{aligned} p(w | z; \beta) & = \int p(w|z, \varphi) p(\varphi | \boldsymbol{\beta}) d\varphi \\ & = \int \prod_{i=1}^{M} \prod_{j=1}^{N} p(w_{i,j}|z_{i,j}, \varphi) \prod_{k=1}^{K} p(\varphi_k | \boldsymbol{\beta}) d\varphi \\ & = \int \prod_{i=1}^{M} \prod_{j=1}^{N} \varphi_{z_{i,j}, w_{i,j}} \prod_{k=1}^{K} \frac{1}{\Delta(\boldsymbol{\beta})} \prod_{v} \varphi_{k,v}^{\beta_v-1} d \varphi \\ & = \int \prod_{k=1}^{K} \frac{1}{\Delta(\boldsymbol{\beta})} \prod_v \varphi_{k,v}^{\beta_v+n_{(\cdot ),v}^{k}-1} d \varphi \\ & = \prod_{k=1}^{K} \frac{\Delta(\beta+n_{(\cdot )}^{k})}{\Delta(\beta)} \quad where \quad n_{(\cdot )}^{k} = (n_{(\cdot ),1}^{k}, \dots, n_{(\cdot ),V}^{k}) \end{aligned}
p(zα)=p(zθ)p(θα)dθ=i=1Mj=1Np(zijθ)m=1Mp(θmα)dθ=i=1Mj=1Nθi,zi,jm=1M1Δ(α)kθm,kαk1dθ=i=1M1Δ(α)kθm,kαk+nm,()k1dθ=i=1MΔ(α+nm,())Δ(α)wherenm,()=(nm,()1,,nm,()K)\begin{aligned} p(z | \alpha) & = \int p(z | \theta) p(\theta | \alpha) d\theta \\ & = \int \prod_{i=1}^{M} \prod_{j=1}^{N} p(z_{ij}|\theta) \prod_{m=1}^{M} p(\theta_m | \alpha) d \theta \\ & = \int \prod_{i=1}^{M} \prod_{j=1}^{N} \theta_{i, z_{i,j}} \prod_{m=1}^{M} \frac{1}{\Delta(\alpha)} \prod_{k} \theta_{m, k}^{\alpha_k-1} d \theta \\ & = \int \prod_{i=1}^{M} \frac{1}{\Delta(\alpha)} \prod_{k} \theta_{m, k}^{\alpha_k+n_{m,(\cdot )}^{k}-1} d \theta \\ & = \prod_{i=1}^{M} \frac{\Delta(\alpha+n_{m,(\cdot )})}{\Delta(\alpha)} \quad where \quad n_{m,(\cdot )} = (n_{m,(\cdot )}^{1}, \dots, n_{m,(\cdot )}^{K}) \end{aligned}

$$ p(Z, W | \alpha, \beta) = p(Z | \alpha) p(W | Z, \beta) = \prod{k=1}^{K} \frac{\Delta(\beta+n{(\cdot )}^{k})}{\Delta(\beta)} \prod{i=1}^{M} \frac{\Delta(\alpha+n{m,(\cdot )})}{\Delta(\alpha)}

\begin{aligned} p(z{m,n} = k | z{-(m,n)}, W) & \propto \frac{p(W, Z | \alpha, \beta)}{p(W{-(m,n)}, Z{-(m,n)}|\alpha, \beta )} \ &\propto \frac{\Delta(\beta+n{(\cdot )}^{k})}{\Delta(\beta + n{(\cdot )}^{k,-(m,n)})} \frac{\Delta(\alpha + n{m,(\cdot )})}{\Delta(\alpha + n{m,(\cdot )}^{-(m,n)})} \ &\propto \frac{\prodv \Gamma(\beta+n{(\cdot ),v}^{k})}{\prodv \Gamma(\beta+n{(\cdot ),v}^{k, -(m,n)})} \frac {\prodk \Gamma(\alpha+n{m,(\cdot )}^{k})}{\prodk \Gamma(\alpha+n{m,(\cdot )}^{k, -(m,n)})} \frac{\Gamma(\sumv \beta+n{(\cdot ),v}^{k, -(m,n)}))}{\Gamma(\sumv \beta+n{(\cdot ),v}^{k}))} \frac {\Gamma(\sumk \alpha+n{m,(\cdot )}^{k, -(m, n)})}{\Gamma(\sumk \alpha+n{m,(\cdot )}^{k}))} \ & \propto \frac {(\beta{w{m,n}}+n{(\cdot ),w{m,n}}^{k, -(m,n)}) (\alphak+n{m,(\cdot )}^{k, -(m,n)})}{(\sumv \beta_v+n{(\cdot ),v}^{k, -(m,n)})) (\sumk \alpha_k+n{m,(\cdot )}^{k, -(m, n)})} \ & \propto (\alphak+n{m,(\cdot )}^{k, -(m,n)}) \frac {\beta{w{m,n}}+n{(\cdot ),w{m,n}}^{k, -(m,n)}}{\sumv \beta_v+n{(\cdot ),v}^{k, -(m,n)}} \ \end{aligned}

$$

  • Applied to new documents

LDA 模型训练完成之后, 可以对新来的文档 new document 做 topic 语义分布的计算, 基本流程和 training 的过程完全类似。

对于新的文档, 我们只要认为 Gibbs Sampling 公式中的 φ^kt\hat{φ}_{kt} 部分是稳定不变的,是由训练语料得到的模型提供的,(即保证topic下word出现次数是固定不变的) 所以采样过程中我们只要估计该文档的 topic 分布 θnew\theta_{new} 就好了。

参考资料

Last updated

Was this helpful?