日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

NLP-基础知识-003(词性标注)

發布時間:2025/4/5 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 NLP-基础知识-003(词性标注) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
目標:詞性標注s = w1w2w3......wn 單詞z = (z1z2......zn) 詞性目的:argmax p(z|s) -> Noisy Channel Model= argmax p(s|z) p(z)p(s|z) - Translation Modelp(z) - Language Model= argmax p(w1w2...wn|z1z2....zn)p(z1z2....zn) (假設條件獨立)= argmax p(w1|z1) p(w2|z2) ..... p(wn|zn)p(z1)p(z2|z1)p(z3|z1z2)......(馬爾科夫假設)= argmax Pi(i=1..n) P(wi|zi) * p(z1)p(z2|z1)p(z3|z2)......=> argmax logpi(i=1...n)p(wi|zi)p(z1) pi(i=2..n)p(zj|zj-1) = argmax sum(i=1..n) log p(wi|zi) + logp(z1) + sum(j=2..n)logp(zj|zj-1)z' = argmax sum(i=1..n) log p(wi|zi) + logp(z1) + sum(j=2..n)logp(zj|zj-1)

"計算pi、A、B代碼,traindata.txt文件數據見文章結尾"tag2id, id2tag = {}, {} # maps tag to id . tag2id: {"VB": 0, "NNP":1,..} , id2tag: {0: "VB", 1: "NNP"....} word2id, id2word = {}, {} # maps word to idfor line in open('traindata.txt'):items = line.split('/')word, tag = items[0], items[1].rstrip() # 抽取每一行里的單詞和詞性if word not in word2id:word2id[word] = len(word2id)id2word[len(id2word)] = wordif tag not in tag2id:tag2id[tag] = len(tag2id)id2tag[len(id2tag)] = tagM = len(word2id) # M: 詞典的大小、# of words in dictionary N = len(tag2id) # N: 詞性的種類個數 # of tags in tag set# 構建 pi, A, B import numpy as np pi = np.zeros(N) # 每個詞性出現在句子中第一個位置的概率, N: # of tags pi[i]: tag i出現在句子中第一個位置的概率 A = np.zeros((N, M)) # A[i][j]: 給定tag i, 出現單詞j的概率。 N: # of tags M: # of words in dictionary B = np.zeros((N,N)) # B[i][j]: 之前的狀態是i, 之后轉換成轉態j的概率 N: # of tagsprev_tag = "" for line in open('traindata.txt'):items = line.split('/')wordId, tagId = word2id[items[0]], tag2id[items[1].rstrip()]if prev_tag == "": # 這意味著是句子的開始pi[tagId] += 1A[tagId][wordId] += 1else: # 如果不是句子的開頭A[tagId][wordId] += 1B[tag2id[prev_tag]][tagId] += 1if items[0] == ".":prev_tag = ""else:prev_tag = items[1].rstrip()# normalize pi = pi/sum(pi) for i in range(N):A[i] /= sum(A[i])B[i] /= sum(B[i])# 到此為止計算完了模型的所有的參數: pi, A, B

知道了pi、A、B,需要求出最優的z

維特比算法最終為一個動態規劃尋找最優路徑的問題,最終代碼如下:

def log(v):if v == 0:return np.log(v+0.000001)return np.log(v)def viterbi(x, pi, A, B):"""x: user input string/sentence: x: "I like playing soccer"pi: initial probability of tagsA: 給定tag, 每個單詞出現的概率B: tag之間的轉移概率"""x = [word2id[word] for word in x.split(" ")] # x: [4521, 412, 542 ..]T = len(x)dp = np.zeros((T,N)) # dp[i][j]: w1...wi, 假設wi的tag是第j個tagptr = np.array([[0 for x in range(N)] for y in range(T)] ) # T*N# TODO: ptr = np.zeros((T,N), dtype=int)for j in range(N): # basecase for DP算法dp[0][j] = log(pi[j]) + log(A[j][x[0]])for i in range(1,T): # 每個單詞for j in range(N): # 每個詞性# TODO: 以下幾行代碼可以寫成一行(vectorize的操作, 會使得效率變高)dp[i][j] = -9999999for k in range(N): # 從每一個k可以到達jscore = dp[i-1][k] + log(B[k][j]) + log(A[j][x[i]])if score > dp[i][j]:dp[i][j] = scoreptr[i][j] = k# decoding: 把最好的tag sequence 打印出來best_seq = [0]*T # best_seq = [1,5,2,23,4,...] # step1: 找出對應于最后一個單詞的詞性best_seq[T-1] = np.argmax(dp[T-1])# step2: 通過從后到前的循環來依次求出每個單詞的詞性for i in range(T-2, -1, -1): # T-2, T-1,... 1, 0best_seq[i] = ptr[i+1][best_seq[i+1]]# 到目前為止, best_seq存放了對應于x的 詞性序列for i in range(len(best_seq)):print (id2tag[best_seq[i]])

最終驗證,輸入一句話,可以得出對應的詞性:

x = "Social Security number , passport number and details about the services provided for the payment" print(viterbi(x, pi, A, B))NNP NNP NN , NN NN CC NNS IN DT NNS VBN IN DT NN traindata.txt 部分訓練語料如下所示:Newsweek/NNP ,/, trying/VBG to/TO keep/VB pace/NN with/IN rival/JJ Time/NNP magazine/NN ,/, announced/VBD new/JJ advertising/NN rates/NNS for/IN 1990/CD and/CC said/VBD it/PRP will/MD introduce/VB a/DT new/JJ incentive/NN plan/NN for/IN advertisers/NNS ./. The/DT new/JJ ad/NN plan/NN from/IN Newsweek/NNP ,/, a/DT unit/NN of/IN the/DT Washington/NNP Post/NNP Co./NNP ,/, is/VBZ the/DT second/JJ incentive/NN plan/NN the/DT magazine/NN has/VBZ offered/VBN advertisers/NNS in/IN three/CD years/NNS ./. Plans/NNS that/WDT give/VBP advertisers/NNS discounts/NNS for/IN maintaining/VBG or/CC increasing/VBG ad/NN spending/NN have/VBP become/VBN permanent/JJ fixtures/NNS at/IN the/DT news/NN weeklies/NNS and/CC underscore/VBP the/DT fierce/JJ competition/NN between/IN Newsweek/NNP ,/, Time/NNP Warner/NNP Inc./NNP 's/POS Time/NNP magazine/NN ,/, and/CC Mortimer/NNP B./NNP Zuckerman/NNP 's/POS U.S./NNP News/NNP &/CC World/NNP Report/NNP ./. Alan/NNP Spoon/NNP ,/, recently/RB named/VBN Newsweek/NNP president/NN ,/, said/VBD Newsweek/NNP 's/POS ad/NN rates/NNS would/MD increase/VB 5/CD %/NN in/IN January/NNP ./. A/DT full/JJ ,/, four-color/JJ page/NN in/IN Newsweek/NNP will/MD cost/VB $/$ 100,980/CD ./. In/IN mid-October/NNP ,/, Time/NNP magazine/NN lowered/VBD its/PRP$ guaranteed/VBN circulation/NN rate/NN base/NN for/IN 1990/CD while/IN not/RB increasing/VBG ad/NN page/NN rates/NNS ;/: with/IN a/DT lower/JJR circulation/NN base/NN ,/, Time/NNP 's/POS ad/NN rate/NN will/MD be/VB effectively/RB 7.5/CD %/NN higher/JJR per/IN subscriber/NN ;/: a/DT full/JJ page/NN in/IN Time/NNP costs/VBZ about/IN $/$ 120,000/CD ./. U.S./NNP News/NNP has/VBZ yet/RB to/TO announce/VB its/PRP$ 1990/CD ad/NN rates/NNS ./. Newsweek/NNP said/VBD it/PRP will/MD introduce/VB the/DT Circulation/NNP Credit/NNP Plan/NNP ,/, which/WDT awards/VBZ space/NN credits/NNS to/TO advertisers/NNS on/IN ``/`` renewal/NN advertising/NN ./. ''/'' The/DT magazine/NN will/MD reward/VB with/IN ``/`` page/NN bonuses/NNS ''/'' advertisers/NNS who/WP in/IN 1990/CD meet/VBP or/CC exceed/VBP their/PRP$ 1989/CD spending/NN ,/, as/RB long/RB as/IN they/PRP spent/VBD $/$ 325,000/CD in/IN 1989/CD and/CC $/$ 340,000/CD in/IN 1990/CD ./.

?

總結

以上是生活随笔為你收集整理的NLP-基础知识-003(词性标注)的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 亚洲天堂一区二区在线 | 日韩欧美国产精品 | 国产盗摄视频在线观看 | 久久久午夜精品福利内容 | 欧美亚洲色综久久精品国产 | 一级少妇毛片 | 老熟妇仑乱视频一区二区 | 污污内射久久一区二区欧美日韩 | 91麻豆国产福利精品 | 9i看片成人免费高清 | 亚洲成人一区在线 | 一级片一级片 | 成人h动漫精品一区二 | av第一页| 成人在线激情网 | 丰满人妻一区二区三区四区 | 免费中文字幕在线观看 | 国产www网站| 深夜福利网站在线观看 | 日本色呦呦 | 日本少妇色视频 | 三级黄色av | 伊人亚洲影院 | 岛国大片在线 | 特级西西人体444www | 欧美三级手机在线观看 | 午夜激情视频在线播放 | 日韩av大片在线观看 | 俺来也在线视频 | 国产人妖视频 | 午夜一区二区三区免费观看 | 老女人性视频 | 日韩精品久久久久久久的张开腿让 | 黑人三级视频 | 成人交配视频 | www.av在线免费观看 | 性色av一区二区 | 一区二区精品区 | 久久b | 木下凛凛子av一区二区三区 | 色鬼综合 | 中文字幕观看视频 | 国产精品国产三级国产aⅴ无密码 | 日韩的一区二区 | 进去里视频在线观看 | 久久黄色av | 色很久| 色屁屁www | 99天堂网| 香蕉国产在线视频 | a爱视频| 亚洲熟妇中文字幕五十中出 | 国产制服丝袜在线 | 91视频区 | 日日碰狠狠添天天爽 | 国产精品无码久久久久 | 久久免费黄色网址 | 影音先锋久久 | xxx久久久| 国产喷白浆一区二区三区 | www.在线观看麻豆 | 激情网久久 | 欧美在线观看视频一区二区 | 91学生片黄 | 国产精品久久久久久免费 | 4438国产精品一区二区 | 亚洲最大的成人网 | 黄色网络在线观看 | 欧美少妇xxxxx | 麻豆三级| www.日本在线观看 | 综合激情网五月 | 亚洲av无码一区二区三区人 | 香蕉视频在线观看视频 | 中文字幕精品一区二 | 黄骗免费网站 | 天天天天躁天天爱天天碰2018 | 欧美日韩a v | 亚洲啪啪av | 伊人网视频在线 | 日韩专区中文字幕 | 九九天堂| 干成人网 | 成人免费看高清电影在线观看 | 一级特黄aa | 高潮毛片无遮挡免费看 | 国产精品国产 | 国产乱在线 | 日韩 中文字幕 | 免看一级片 | 日韩簧片在线观看 | 久久国| 国产黄色一级片视频 | 欧美亚洲一区二区三区四区 | 亚洲精品乱码久久久久久按摩观 | 亚洲国产欧美日韩在线 | 国产综合网站 | 日韩av综合在线 | 国产成人综合亚洲 |