日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > php >内容正文

php

基于PHP实现高性能敏感词过滤算法

發(fā)布時間:2023/12/20 php 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 基于PHP实现高性能敏感词过滤算法 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
  • 公司新項(xiàng)目素材編輯功能需要提供敏感詞過濾功能,于是上網(wǎng)查了下,很多都是基于trie算法的,但基于PHP寫的卻少有,或者部分存在bug。所以,自己在別人的基礎(chǔ)上進(jìn)行了完善。

敏感詞過濾算法實(shí)現(xiàn)

class TreeMap {public $data; // 節(jié)點(diǎn)字符public $children = []; // 存放子節(jié)點(diǎn)引用(因?yàn)橛腥我鈧€子節(jié)點(diǎn),所以靠數(shù)組來存儲)public $isEndingChar = false; // 是否是字符串結(jié)束字符public function __construct($data){$this->data = $data;} }class TrieTree {/*** 敏感詞數(shù)組* * @var array* @author qpf*/public $trieTreeMap = array();public function __construct(){$this->trieTreeMap = new TreeMap('/');}/*** 獲取敏感詞Map* * @return array* @author qpf*/public function getTreeMap(){return $this->trieTreeMap;}/*** 添加敏感詞* * @param array $txtWords* @author qpf*/public function addWords(array $wordsList){foreach ($wordsList as $words) {$trieTreeMap = $this->trieTreeMap;$len = mb_strlen($words);for ($i = 0; $i < $len; $i++) {$word = mb_substr($words, $i, 1);if(!isset($trieTreeMap->children[$word])){$newNode = new TreeMap($word);$trieTreeMap->children[$word] = $newNode;}$trieTreeMap = $trieTreeMap->children[$word];}$trieTreeMap->isEndingChar = true;}}/*** 查找對應(yīng)敏感詞* * @param string $txt* @return array* @author qpf*/public function search($txt){$wordsList = array();$txtLength = mb_strlen($txt);for ($i = 0; $i < $txtLength; $i++) {$wordLength = $this->checkWord($txt, $i, $txtLength);if($wordLength > 0) {echo $wordLength;$words = mb_substr($txt, $i, $wordLength);$wordsList[] = $words;$i += $wordLength - 1;}}return $wordsList;}/*** 敏感詞檢測* * @param $txt* @param $beginIndex* @param $length* @return int*/private function checkWord($txt, $beginIndex, $length){$flag = false;$wordLength = 0;$trieTree = $this->trieTreeMap; //獲取敏感詞樹for ($i = $beginIndex; $i < $length; $i++) {$word = mb_substr($txt, $i, 1); //檢驗(yàn)單個字if (!isset($trieTree->children[$word])) { //如果樹中不存在,結(jié)束 break;}//如果存在$wordLength++; $trieTree = $trieTree->children[$word];if ($trieTree->isEndingChar === true) { $flag = true;break;}}if($beginIndex > 0) {$flag || $wordLength = 0; //如果$flag == false 賦值$wordLenth為0}return $wordLength;}}$data = ['白粉', '白粉人', '白粉人嫩','不該大']; $wordObj = new TrieTree(); $wordObj->addWords($data);$txt = "白粉啊,白粉人,我不該大啊"; $words = $wordObj->search($txt); var_dump($words);die;

總結(jié)

以上是生活随笔為你收集整理的基于PHP实现高性能敏感词过滤算法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。