python合理拆分类别_如何用Python进行词组拆分?
【20200904】
可以先split/,做好特殊字符標記,存儲到臨時變量里面,比如,元組,數組,或者字典之類的;再遍歷上面的變量,拆分括號,用一個特殊標記,標記括號里面的內容,總之找到區分括號和非括號內容就可以,之后存儲到變量;最后遍歷第二個變量,生成句型
【20200905】
抱歉最近精神狀態不太好,又比較忙,今天大概寫了下,應該沒有啥問題,還有就是生成循序的問題,這個我有時間再看下,如果要改的話大概是bottom_fuc函數,和調用它的那里的邏輯。還有一種方式就是對每個句型生成一個列表,最后直接joint,但是我覺得這樣會占更大的緩存空間,所以沒有用。代碼直接貼上來
import logging
import re
f = open("./phasesplit")
line_true = f.readline()
list_all = []
list_size = 0
i = 0
# 將兩個參數進行排列組合連接
# inner_list:待添加的字符串列表
# org_str_list:已經連接的字符串列表
def bottom_fuc(inner_list = list, org_str_list = list):
inner_new_str_list = list()
for s in inner_list:
st = str(s)
for s1 in org_str_list:
st1 = str(s1)
inner_new_str_list.append(st1 + " " + st)
return inner_new_str_list
#主循環
while line_true:
# 保存分號后的內容
semi_str = ""
# 分號前面的內容
line = ""
# 可以判斷分號個數,這里不進行判斷
if line_true.find(";") > 0:
# 賦值
line, semi_str = line_true.split(";")
semi_str = str(semi_str).strip()
line = str(line).strip()
else:
line = line_true
list_for_loop = re.split("(\(.+?\))", line)
list_for_loop_new = []
# 繼續進行數據置換
for ind, lp in enumerate(list_for_loop, 0):
tmp_lp = lp
# 存在空格且沒有括號
if tmp_lp.find("(") + tmp_lp.find(")") < 0 and tmp_lp.find(" "):
# 進行置換
for data in tmp_lp.split(" "):
list_for_loop_new.append(data)
else:
list_for_loop_new.append(lp)
list_str = []
# 將數據進行進一步拆分
for ind, s in enumerate(list_for_loop_new, 0):
str_tmp = s
pare_flg = 0
# 去除括號,添加空格
if str_tmp.find("(")+str_tmp.find(")") >= 0:
str_tmp = str_tmp.strip(r"(").strip(r")")
str_tmp = " /"+ str_tmp
pare_flg = 1
# 按/拆分
if str_tmp.find("/") >= 0:
if pare_flg == 1:
pare_str = str_tmp.split("/")
list_str.append(pare_str)
else:
list_str.append(str_tmp.split("/"))
else:
list_str.append(str_tmp)
pare_flg = 0
new_str_list = []
# 組裝拆分后的數據
for l_str in list_str:
if isinstance(l_str, str):
if len(new_str_list) == 0:
new_str_list.append(l_str)
else:
for ind, ns in enumerate(new_str_list, 0):
new_str_list[ind] = new_str_list[ind] + " " +l_str
elif isinstance(l_str, list):
if len(new_str_list) == 0:
new_str_list.append("")
new_str_list = bottom_fuc(l_str, new_str_list)
else:
logging.error("錯誤類型: ", type(l_str), l_str)
exit(-1)
# 格式處理
for ind, ns in enumerate(new_str_list, 0):
ns.rstrip("\r\n")
if len(semi_str) > 0:
new_str_list[ind] = re.sub(" {2,}", " ", new_str_list[ind].strip()) + ";" + semi_str
else:
new_str_list[ind] = re.sub(" {2,}", " ", new_str_list[ind].strip())
if len(semi_str) > 0:
new_str_list.insert(0, line + ";" + semi_str)
else:
new_str_list.insert(0, line.rstrip("\r\n"))
i += 1
# 讀取下一行
line_true = f.readline()
# 添加到總列表
list_all.append(new_str_list)
list_size = i
f.close()
# 寫文件
with open("result.txt", "w") as nf:
nf.write("#############################################\r")
nf.write("#section:{}\r".format(list_size))
nf.write("#############################################\r")
for la in list_all:
for nl in la:
nf.write(nl+"\r")
nf.write("\r")
nf.write("#############################################\r")
nf.close()
輸入文件(phasesplit)
quarrel (with sb) about/for/over ; 2313
dabble at/in/with
(sb/sth) damn and blast (sb/sth)
dance on/upon a rope/nothing
dance on (the) air
dead/flat/stark calm
do/go/make the/one's round
do (sb/sth) grace
輸出文件(result.txt)
#############################################
#section:8
#############################################
quarrel (with sb) about/for/over;2313
quarrel about;2313
quarrel with sb about;2313
quarrel for;2313
quarrel with sb for;2313
quarrel over;2313
quarrel with sb over;2313
#############################################
dabble at/in/with
dabble at
dabble in
dabble with
#############################################
(sb/sth) damn and blast (sb/sth)
damn and blast
sb damn and blast
sth damn and blast
damn and blast sb
sb damn and blast sb
sth damn and blast sb
damn and blast sth
sb damn and blast sth
sth damn and blast sth
#############################################
dance on/upon a rope/nothing
dance on a rope
dance upon a rope
dance on a nothing
dance upon a nothing
#############################################
dance on (the) air
dance on air
dance on the air
#############################################
dead/flat/stark calm
dead calm
flat calm
stark calm
#############################################
do/go/make the/one's round
do the round
go the round
make the round
do one's round
go one's round
make one's round
#############################################
do (sb/sth) grace
do grace
do sb grace
do sth grace
#############################################
總結
以上是生活随笔為你收集整理的python合理拆分类别_如何用Python进行词组拆分?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 薯条是什么?
- 下一篇: 泰国带叶子的饮料为啥不能喝