日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CSIC2010学习Word2vec表示及可视化

發布時間:2024/1/1 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 CSIC2010学习Word2vec表示及可视化 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1sudo apt-get install liblapack-dev 2sudo apt-get install gfortran 3sudo apt-get install python-pandas 4sudo pip install --upgrade gensim 5sudo pip install jieba 6sudo pip install theano (0.7)

根據給定詞生成word2vec詞向量

# -*- coding: utf-8 -*- """ Created on Thu Jun 15 16:24:01 2017@author: Jiabao Wang @description: Generate word2vec model based on given words """import gensim.models.word2vec as w2v def train_model(input_file_name, model_file_name):#模型訓練,生成詞向量 sentences = w2v.LineSentence(input_file_name) model = w2v.Word2Vec(sentences, size=20, window=5, min_count=5, workers=4) model.save(model_file_name)input_file_name = 'wPred_word.txt' # Input Words model_file_name = 'wPred_model.txt' # Output Model train_model(input_file_name, model_file_name)# Compute and evaluate similarity and probability model = w2v.Word2Vec.load(model_file_name) print model.similarity('eval','@') for k in model.similar_by_word('eval'):print str(k[1])+"\t# "+k[0].decode('utf-8')

可視化詞向量

# -*- coding: utf-8 -*- """ # This is the visualization for the embedding word vectors: # Input: the words for visualization, the words labels, and the word2vec model # Output: the visualization of the given wordsCreated on Thu Jun 22 01:55:37 2017@author: Jiabao Wang """import numpy as np from gensim.models.word2vec import Word2Vec import matplotlib.pyplot as plt #import sklearn.manifold.TSNE as tsnemodelpath = 'pub_data/wPred_model.txt' # 詞向量模型 model = Word2Vec.load(modelpath) sentenceFilePath = 'pub_data/wordList.txt' # 可視化詞的詞典 labelFilePath = 'pub_data/wordName.txt' # 可視化詞對應顯示名稱visualizeVecs = [] with open(sentenceFilePath, 'r') as f:for line in f:word = line.strip()vec = model[word]visualizeVecs.append(vec)visualizeWords = [] with open(labelFilePath, 'r') as f:for line in f:word = line.strip()visualizeWords.append(word)visualizeVecs = np.array(visualizeVecs).astype(np.float64) #Y = tsne(visualizeVecs, 2, 200, 20.0); ## Plot.scatter(Y[:,0], Y[:,1], 20,labels); ## ChineseFont1 = FontProperties(‘SimHei‘) #for i in xrange(len(visualizeWords)): # # if i<len(visualizeWords)/2: # # color=‘green‘ # # else: # # color=‘red‘ # color = 'red' # plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1)) #plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0]))) #plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1]))) #plt.show()# vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True)) # temp = temp / vis_norm temp = (visualizeVecs - np.mean(visualizeVecs, axis=0)) covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp) U, S, V = np.linalg.svd(covariance) coord = temp.dot(U[:, 0:2]) for i in xrange(len(visualizeWords)):print iprint coord[i, 0]print coord[i, 1]color = 'red'plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),fontsize=12) # fontproperties = ChineseFont1 plt.xlim((np.min(coord[:, 0])-5, np.max(coord[:, 0])+5)) plt.ylim((np.min(coord[:, 1])-5, np.max(coord[:, 1])+5)) plt.savefig('pub_data/distrubution.png', format='png',dpi = 1000,bbox_inches='tight') plt.show()

可視化效果如下:


圖中間部分的詞為SQL攻擊的關鍵詞,相對其他詞更加聚集。

總結

以上是生活随笔為你收集整理的CSIC2010学习Word2vec表示及可视化的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。