日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python nlp data_Python nlpaug包_程序模块 - PyPI - Python中文网

發布時間:2025/3/8 python 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python nlp data_Python nlpaug包_程序模块 - PyPI - Python中文网 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

NLPAUG

這個python庫幫助您為機器學習項目增加nlp。訪問此簡介了解Data Augmentation in NLP。Augmenter是增廣的基本元素,而Flow是將多個增廣器組合在一起的管道。

起動指南

增強器TargetAugmenterActionDescriptionCharacterRandomAuginsertInsert character randomly

substituteSubstitute character randomly

swapSwap character randomly

deleteDelete character randomly

OcrAugsubstituteSimulate OCR engine error

KeyboardAugsubstituteSimulate keyboard distance error

WordRandomWordAugswapSwap word randomly

deleteDelete word randomly

SpellingAugsubstituteSubstitute word according to spelling mistake dictionary

WordNetAugsubstituteSubstitute word according to WordNet's synonym

WordEmbsAuginsertInsert word randomly from word2vec, GloVe or fasttext dictionary

substituteSubstitute word based on word2vec, GloVe or fasttext embeddings

TfIdfAuginsertInsert word randomly trained TF-IDF model

substituteSubstitute word based on TF-IDF score

BertAuginsertInsert word based by feeding surroundings word to BERT language model

substituteSubstitute word based by feeding surroundings word to BERT language model

SpectrogramFrequencyMaskingAugsubstituteSet block of values to zero according to frequency dimension

TimeMaskingAugsubstituteSet block of values to zero according to time dimension

AudioNoiseAugsubstituteInject noise

PitchAugsubstituteAdjust audio's pitch

ShiftAugsubstituteShift time dimension forward/ backward

SpeedAugsubstituteAdjust audio's speed

CropAugdeleteDelete audio's segment

LoudnessAugsubstituteAdjust audio's volume

MaskAugsubstituteMask audio's segment

流量PipelineDescriptionSequentialApply list of augmentation functions sequentially

SometimesApply some augmentation functions randomly

安裝

該庫在linux和windows平臺上支持python 3.5+。

要安裝庫:pip install nlpaug

或者直接從github安裝最新版本(包括beta版功能)pip install git+https://github.com/makcedward/nlpaug.git

如果您使用bertaug,請同時安裝以下依賴項pip install pytorch_pretrained_bert torch

如果使用wordembsaug(word2vec、glove或fasttext),請先下載經過培訓的模型from nlpaug.util.file.download import DownloadUtil

DownloadUtil.download_word2vec(dest_dir='.')# Download word2vec model

DownloadUtil.download_glove(model_name='glove.6B', dest_dir='.')# Download GloVe model

DownloadUtil.download_fasttext(model_name='wiki-news-300d-1M', dest_dir='.')# Download fasttext model

最近的更改

beta2019年8月16日添加新增強器(Cropaug、LoudnessAug、Maskaug)

QWERTYAUG已棄用。它將被鍵盤所取代

刪除StopWordSaug。它將被randomWordAug替換

代碼重構

為word2vec、glove和fasttext添加了模型下載功能

^{str 1}0.0.6美元2019年7月29日:

有關詳細信息,請參見changelog。

測試Word2vec, GloVe, Fasttext models are used in word insertion and substitution. Those model files are necessary in order to run test case. You have to add ".env" file in root directory and the content should be

- MODEL_DIR={MODEL FILE PATH}Folder structure of model should be

-- root directory

- glove.6B.50d.txt

- GoogleNews-vectors-negative300.bin

- wiki-news-300d-1M.vec

研究參考

以上的一些增強器是受到以下研究論文的啟發。但是,由于不同的原因,它并不總是遵循最初的實現。如果需要原始實現,請參考原始源代碼。

數據源

用于構建增強器/測試用例的來自Internet的飽和數據。

有關詳細信息,請參見data source。

歡迎加入QQ群-->: 979659372

推薦PyPI第三方庫

總結

以上是生活随笔為你收集整理的python nlp data_Python nlpaug包_程序模块 - PyPI - Python中文网的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。