當前位置：首頁 > 编程语言 > python >内容正文

python

python 中m op n运算_nltk语言模型（ngram）计算上下文中单词的prob

發布時間：2023/12/8 python 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 中m op n运算_nltk语言模型（ngram）计算上下文中单词的prob 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

我知道這個問題很老了，但每次我在google nltk的NgramModel類中搜索時，它都會彈出。NgramModel的prob實現有點不直觀。提問者很困惑。據我所知，答案并不好。因為我不經常使用NgramModel，這意味著我會感到困惑。再也沒有了。def prob(self, word, context):

"""

Evaluate the probability of this word in this context using Katz Backoff.

:param word: the word to get the probability of

:type word: str

:param context: the context the word is in

:type context: list(str)

"""

context = tuple(context)

if (context + (word,) in self._ngrams) or (self._n == 1):

return self[context].prob(word)

else:

return self._alpha(context) * self._backoff.prob(word, context[1:])

(注意：'self[context].prob(word)等同于'self.[context].prob(word)'

好吧。現在至少我們知道該找什么了。上下文需要什么？讓我們看看構造器中的一段摘錄：for sent in train:

for ngram in ingrams(chain(self._lpad, sent, self._rpad), n):

self._ngrams.add(ngram)

context = tuple(ngram[:-1])

token = ngram[-1]

cfd[context].inc(token)

if not estimator_args and not estimator_kwargs:

self._model = ConditionalProbDist(cfd, estimator, len(cfd))

else:

self._model = ConditionalProbDist(cfd, estimator, *estimator_args, **estimator_kwargs)

好吧。構造器從條件頻率分布中創建條件概率分布(self.\u model)，條件頻率分布的“context”是unigrams的元組。這告訴我們“context”應該而不是是一個字符串或是一個包含多個單詞的字符串的列表上下文“必須是包含unigrams的iterable。事實上，這個要求要嚴格一點。這些元組或列表的大小必須為n-1。這樣想吧。你告訴過它是一個三位一體的模型。你最好給它一個合適的三聯圖上下文。

讓我們用一個簡單的例子來說明這一點：>>> import nltk

>>> obs = 'the rain in spain falls mainly in the plains'.split()

>>> lm = nltk.NgramModel(2, obs, estimator=nltk.MLEProbDist)

>>> lm.prob('rain', 'the') #wrong

0.0

>>> lm.prob('rain', ['the']) #right

0.5

>>> lm.prob('spain', 'rain in') #wrong

0.0

>>> lm.prob('spain', ['rain in']) #wrong

'''long exception'''

>>> lm.prob('spain', ['rain', 'in']) #right

1.0

(順便說一下，在NgramModel中嘗試使用MLE作為估計器是一個壞主意。事情會分崩離析的。我保證。)

至于最初的問題，我想我對OP想要什么最好的猜測是：print lm.prob("word", "generates a".split())

print lm.prob("b", "generates a".split())

……但是這里有太多的誤解，我不知道他到底想干什么。

總結

以上是生活随笔為你收集整理的python 中m op n运算_nltk语言模型（ngram）计算上下文中单词的prob的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：虚拟化Citrix Prob “VDI态
下一篇： python海龟交易策略_【手把手教你】