當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python提取字符串中的　中文　日文　韩文

發(fā)布時間：2025/4/5 python 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 python提取字符串中的　中文　日文　韩文小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

import imp imp.reload(sys)s=""" en: Regular expression is a powerful tool for manipulating text. zh: 漢語是世界上最優(yōu)美的語言，正則表達式是一個很有用的工具 jp: 正規(guī)表現(xiàn)は非常に役に立つツールテキストを操作することです。 jp-char: あアいイうウえエおオ kr:?? ???? ?? ??? ?? ???? ???? ????. """ print ("原始utf8字符" ) #utf8 print ("--------" ) print( repr(s) ) print( "--------\n" ) 原始utf8字符 -------- ' \n en: Regular expression is a powerful tool for manipulating text. \n zh: 漢語是世界上最優(yōu)美的語言，正則表達式是一個很有用的工具 \n jp: 正規(guī)表現(xiàn)は非常に役に立つツールテキストを操作することです。 \n jp-char: あアいイうウえエおオ \n kr:?? ???? ?? ??? ?? ???? ???? ????. \n ' --------

非ansi

#非ansi re_words=re.compile(r"[\x80-\xff]+") #m = re_words.search(s,0) m1=re.findall(re_words, s)print ("非ansi字符" ) print ("--------" ) print (m1 ) #print (m.group() ) print ("--------\n" ) 非ansi字符 -------- [] --------

中文

re_words = re.compile(u"[\u4e00-\u9fa5]+") #m = re_words.search(s) m1=re.findall(re_words, s) #print(''.join(m1)) print( "unicode 中文" ) print(m1) print( "--------" ) unicode 中文 ['漢語是世界上最優(yōu)美的語言', '正則表達式是一個很有用的工具', '正規(guī)表現(xiàn)', '非常', '役', '立', '操作'] --------

韓文

#unicode korean re_words=re.compile(u"[\uac00-\ud7ff]+") #m = re_words.search(s,0) m1=re.findall(re_words, s) print( "unicode 韓文" ) print(m1) print( "--------\n" ) unicode 韓文 ['??', '????', '??', '???', '??', '????', '????', '????'] --------

日文片假名

#unicode japanese katakana re_words=re.compile(u"[\u30a0-\u30ff]+") #m = re_words.search(s,0) m1=re.findall(re_words, s) print( "unicode 日文片假名" ) print ("--------" )print(m1) print( "--------\n" ) unicode 日文片假名 -------- ['ツールテキスト', 'ア', 'イ', 'ウ', 'エ', 'オ'] --------

日文平假名

#unicode japanese hiragana re_words=re.compile(u"[\u3040-\u309f]+") #m = re_words.search(s,0) m1=re.findall(re_words, s) print( "unicode 日文平假名" ) print ("--------" )print(m1) print( "--------\n" ) unicode 日文平假名 -------- ['は', 'に', 'に', 'つ', 'を', 'することです', 'あ', 'い', 'う', 'え', 'お'] --------

標(biāo)點符號

#unicode cjk Punctuation re_words=re.compile(u"[\u3000-\u303f\ufb00-\ufffd]+") #m = re_words.search(s,0) m1=re.findall(re_words, s) print( "unicode 標(biāo)點符號" ) print ("--------" )print(m1) print( "--------\n" ) unicode 標(biāo)點符號 -------- ['，', '。'] --------

總結(jié)

以上是生活随笔為你收集整理的python提取字符串中的　中文　日文　韩文的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：使用不同的方法计算TF-IDF值
下一篇： python 去除字符串里所有标点符号

python

python提取字符串中的 中文 日文 韩文

總結(jié)

python提取字符串中的　中文　日文　韩文