MIT自然语言处理第三讲：概率语言模型（第二部分）

自然语言处理：概率语言模型
Natural Language Processing: Probabilistic Language Modeling
作者：Regina Barzilay（MIT,EECS Department, November 15, 2004)
译者：我爱自然语言处理（www.52nlp.cn ，2009年1月17日）

二、语言模型构造
a) 语言模型问题提出（The Language Modeling Problem）
　i. 始于一些词汇集合（Start with some vocabulary）:
　　ν= {the, a, doctorate, candidate, Professors, grill, cook, ask, ...}
　ii. 得到一个与词汇集合v关的训练样本（Get a training sample of v）:
　　Grill doctorate candidate.
　　Cook Professors.
　　Ask Professors.
　　…...
　iii. 假设（Assumption）:训练样本是由一些隐藏的分布P刻画的（training sample is drawn from some underlying distribution P）
　iv. 目标（Goal）: 学习一个概率分布P prime尽可能的与P近似（learn a probability distribution P prime “as close” to P as possible）
　　　　　sum{x in v}{}{P prime (x)}=1, P prime (x) >=0
　　　　　P prime (candidates)=10^{-5}
　　　　　{P prime (ask~candidates)}=10^{-8}
b) 获得语言模型（Deriving Language Model）
　i. 向一组单词序列w_{1}w_{2}...w_{n}赋予概率（Assign probability to a sequence w_{1}w_{2}...w_{n} ）
　ii. 应用链式法则（Apply chain rule）:
　　1. P(w1w2...wn)= P(w1|S)∗P(w2|S,w1)∗P(w3|S,w1,w2)...P(E|S,w1,w2,...,wn)
　　2. 基于“历史”的模型(History-based model): 我们从过去的事件中预测未来的事件（we predict following things from past things）
　　3. 我们需要考虑多大范围的上下文（How much context do we need to take into account）?
c) 马尔科夫假设（Markov Assumption）
　i. 对于任意长度的单词序列P(wi|w(i-n) ...w(i−1))是比较难预测的（For arbitrary long contexts P(wi|w(i-n) ...w(i−1))difficult to estimate）
　ii. 马尔科夫假设（Markov Assumption）: 第i个单词wi仅依赖于前n个单词（wi depends only on n preceding words）
　iii. 三元语法模型（又称二阶马尔科夫模型）（Trigrams (second order)）:
　　1. P(wi|START,w1,w2,...,w(i−1）)=P(wi|w(i−1),w(i−2))
　　2. P(w1w2...wn)= 　P(w1|S)∗P(w2|S,w1)∗P(w3|w1,w2)∗...P(E|w(n−1),wn)
d) 一种语言计算模型（A Computational Model of Language）
　i. 一种有用的概念和练习装置（A useful conceptual and practical device）:“抛硬币”模型（coin-flipping models）
　　1. 由随机算法生成句子（A sentence is generated by a randomized algorithm）
　　——生成器可以是许多“状态”中的一个（The generator can be one of several “states”）
　　——抛硬币决定下一个状态（Flip coins to choose the next state）
　　——抛其他硬币决定哪一个字母或单词输出（Flip other coins to decide which letter or word to output）
　ii. 香农（Shannon）: “The states will correspond to the“residue of influence” from preceding letters”
e) 基于单词的逼近（Word-Based Approximations）
　注：以下是用莎士比亚作品训练后随机生成的句子，可参考《自然语言处理综论》
　i. 一元语法逼近（这里MIT课件有误，不是一阶逼近（First-order approximation））
　　1. To him swallowed confess hear both. which. OF save
　　2. on trail for are ay device and rote life have
　　3. Every enter now severally so, let
　　4. Hill he late speaks; or! a more to leg less first you
　　5. enter
　ii. 三元语法逼近（这里课件有误，不是三阶逼近（Third-order approximation））
　　1. King Henry. What! I will go seek the traitor Gloucester.
　　2. Exeunt some of the watch. A great banquet serv’s in;
　　3. Will you tell me how I am?
　　4. It cannot be but so.

未完待续：第三部分

附：课程及课件pdf下载MIT英文网页地址：
　　　http://people.csail.mit.edu/regina/6881/

注：本文遵照麻省理工学院开放式课程创作共享规范翻译发布，转载请注明出处“我爱自然语言处理”：www.52nlp.cn

本文链接地址：
https://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-second-part/

MIT自然语言处理第三讲：概率语言模型（第二部分）

作者52nlp

作者 52nlp

相关文章

如何构建和优化推理型大型语言模型？DeepSeek R1的启示

新浪张俊林：大语言模型的涌现能力——现象与解释

中科院张家俊：ChatGPT中的提示与指令学习

《MIT自然语言处理第三讲：概率语言模型（第二部分）》有3条评论

发表回复

You missed

Qwen2.5-VL：阿里巴巴新一代多模态大模型的技术突破与应用前景

Native Sparse Attention（NSA）：重新定义长上下文建模的效率与性能