python中pos()_python中不带NLTK的POS标记器
我想給索拉尼庫爾德語的限定詞和介詞做一個詞性標記。我使用下面的代碼將每個標記放在庫爾德語文本中的每個命題或限定詞之后。在import os
SOR = open("SOR-1.txt", "r+", encoding = 'utf-8')
old_text = SOR.read()
punkt = [".", "!", ",", ":", ";"]
text = ""
for i in old_text:
if i in punkt:
text+=" "+i
else:
text += i
d = {"DET":["????" , "????" , "???" , "???" , "?????" , "?????", "????" ], "PREP":["??","??","?????","??","????","?????","??????","?????","??????","??????","?????","?????","??","??","???","????","?????","???","??","??","???????","??????","???????","???????","????","???????","?????","?????","????","??????","??????","?????","???????","?????","?????","???","????????","?????","?????","???","?????","???","???","???","???","" ], "punkt":[".", ",", "!"]}
text = text.split()
for w in text:
for pos in d:
if w in d[pos]:
SOR.write(w+"/"+pos+" ")
SOR.close()
我想做的是在定義的字典中的每個單詞之后在文本中添加POS標記,但是結果是在文件末尾有一個單詞和POS標記的單獨列表。在
總結
以上是生活随笔為你收集整理的python中pos()_python中不带NLTK的POS标记器的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: nvidia docker容器不支持中文
- 下一篇: mysql空洞_optimize tab