當前位置：首頁 > 运维知识 > windows >内容正文

windows

如何利用计算机做主题模型,利用概率主题模型的微博热点话题发现方法-计算机系统应用.PDF...

發布時間：2025/3/19 windows 35 豆豆

生活随笔收集整理的這篇文章主要介紹了如何利用计算机做主题模型,利用概率主题模型的微博热点话题发现方法-计算机系统应用.PDF... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

利用概率主題模型的微博熱點話題發現方法-計算機系統應用

2014 年第 23 卷第 8 期計算機系統應用

①

利用概率主題模型的微博熱點話題發現方法

1 2

米文麗 , 孫曰昕

1(隴東學院信息工程學院, 慶陽 745000)

2(西北師范大學計算機科學與工程學院, 蘭州 730070)

摘要: 微博具有長度短、實時傳播、結構復雜以及變形詞多等特點, 傳統的向量空間模型(VSM)文本表示方法

和隱含語義分析(LSA)無法很好的對其進行建模. 提出了一種基于概率潛在語義分析(pLSA)和 K 均值聚類

(Kmeans) 的二階段聚類算法, 此外通過定義微博熱度分析和排序, 有效地支持微博熱點話題發現. 實驗表明, 此

方法能有效地進行話題聚類并檢測出熱點話題.

關鍵詞: 概率潛在語義分析; 話題發現; 微博; Kmeans

Microblog Hot Topics Discovery Method Based on Probabilistic Topic Model

1 2

MI Wen-Li , SUN Yue-Xin

1(College of Information Engineering, Longdong University, Qingyang 745000, China)

2(College of Computer Science & Engineering, Northwest Normal University, Lanzhou 730070, China)

Abstract: Microblog has the characteristic of short length, complex structure and words deformation. Therefore,

traditional vector space model (VSM) and latent semantic analysis (LSA) are not suitable for modeling them. In this

paper, a two stage clustering algorithm based on probabilistic latent semantic analysis (pLSA) and Kmeans clustering

(Kmeans) is proposed. Besides, this paper also presents the definition of popularity and mechanism of sorting the topics.

Experiments show that our method can effectively cluster topics and be applied to microblog hot topic detection.

Key words: probabilistic latent semantic analysis; topic detection; microblog; Kmeans

近年來,在互聯網上蓬勃發展的微博客(微博)越來的 Twitter 上的檢索日志和傳統搜索引擎上的檢索日

越多地引起了人們的關注. 微博從傳統的社交網絡中志, 對微博上的搜索和傳統的 Web 搜索做了一個完善

脫胎而出,在擁有了獨立的服務平臺后逐漸演化為一而全面的對比, 發現 Twitter 用戶傾向于去搜索時間相

種新的信息發布形式. 關的信息, 比如爆炸性的新聞和一些當前的流行趨勢;

然而, 微博數據主要由普通用戶產生, 無論是用 Neil[6]認為 Twitter 是對整個社會事實的反應,可以從中

詞、形式還是具

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。