python 主语_前深度学习时代--FFM模型的原理与Python实现
基于上一篇分析中協(xié)同過濾、邏輯回歸及FM的比較,可以得出這樣一個(gè)結(jié)論:
主流模型迭代的關(guān)鍵在于增強(qiáng)模型表達(dá)能力,而增強(qiáng)方式的主要脈絡(luò)為:
更通俗的表達(dá):
- 模型迭代在于找到更多的有效信息。
本文想要回顧的FFM(Field-aware Factorization Macheines)模型可以看作是FM模型的增強(qiáng)版,其正是沿用在FM模型的特征組合思想,并將其發(fā)揚(yáng)光大,曾在多項(xiàng)CTR預(yù)估賽中奪魁,并且被Criteo、美團(tuán)等公司深度應(yīng)用在推薦系統(tǒng)與CTR預(yù)估等領(lǐng)域。
相較于FM模型,FFM模型在FM隱向量特征交叉組合的基礎(chǔ)上,進(jìn)一步引入了特征感知(field-aware)這一概念,使得模型表達(dá)能力在理論上有了一個(gè)較大的提升。
FM模型表達(dá)式:
FFM的表達(dá)式如下所示:
從表達(dá)式中可以看出,FFM與FM的不同之處在于二階特征組合部分的隱向量由
FFM這么做的原因是什么?
什么是特征域Fileds,FFM怎么去做特征域級(jí)的特征組合?
參考論文原文 ,“features” can be grouped into “fields” 。把這句話的主語和賓語置換一下就可以得到Fileds的定義: “fields” is grouped by "features",也即特征域由某類特征組成。
借用論文的示例,或許可以更直觀的回答這個(gè)問題:
Table 1: An artificial CTR data set, where + (-) representsthe number of clicked (unclicked) impressions.Table 2:a example of click event在上面兩個(gè)表中,Publisher(P)、Advertiser(A)和Gander(G)是三個(gè)Fileds特征域。其中Publisher特征域的中的feature有ESPN、Vogue和NBC,套用作者定義那么有:fields of Publisher is grouped by features<ESPN,Vogue,NBC>。Advertiser與Gander特征域同理。
那么對(duì)于Table 2的數(shù)據(jù),FFM的二階特征組合結(jié)果為:
而對(duì)于FM二階特征組合結(jié)果而言:
可以看到在FFM模型中,Feature ESPN在與NIKE及Male形成特征組合ESPN,NIKE)、(ESPN,Male)時(shí),使用了不同的潛向量
及 來學(xué)習(xí)參數(shù)。同理,Feature NIKE在與ESPN及Male特征組合時(shí),也分別使用了兩個(gè)不同的潛向量 及 來計(jì)算組合權(quán)重。而對(duì)于FM而言,Feature ESPN在與NIKE及Male特征組合時(shí),使用的是同一個(gè)潛向量
,同時(shí)NIKE與Male特征也只存在唯一一個(gè)隱向量。至此,什么是特征域Fileds,FFM怎么去做特征域級(jí)的特征組合得到解釋。
一個(gè)問題
以上解釋了FFM模型相較于FM模型的優(yōu)勢(shì)在于,每個(gè)特征在與不同F(xiàn)ield下的Feature進(jìn)行組合時(shí),會(huì)使用與之特征域?qū)?yīng)的隱向量來學(xué)習(xí)組合權(quán)重。而這也帶來了一個(gè)FFM不被大規(guī)模應(yīng)用的問題--參數(shù)量暴增,解釋如下:
假設(shè)一份數(shù)據(jù)集dataset的維度為(m,n),FM在進(jìn)行模型訓(xùn)練時(shí)將隱向量特征維度設(shè)置為k維,那么FM的特征量便為nk。而對(duì)于FFM而言,假設(shè)dataset的n維特征對(duì)應(yīng)著f個(gè)特征域,那么在隱向量維度同維k的情況下,FFM的特征量級(jí)為nfk(實(shí)際數(shù)量為n(f-1)k,由于特征不需要自我交叉,因此為f-1)。
nk與nfk之間的差別雖然只是線性級(jí),但是由于互聯(lián)網(wǎng)數(shù)據(jù)特征量n動(dòng)輒百萬級(jí),雖然f的值會(huì)比n小若干個(gè)數(shù)量級(jí),但這也足以使得FFM模型的參數(shù)量暴增到一個(gè)恐怖的級(jí)別。
Table 3:two CTR datasets Criteo and Avazu from Kaggle competitions以Table3數(shù)據(jù)集Criteo為例,當(dāng)維度k取為10時(shí),FM的參數(shù)量為
,而FFM的量級(jí)為 。雖然只增加了一個(gè)數(shù)量級(jí)別,但參數(shù)量從千萬級(jí)變?yōu)榱藘|級(jí)別,可謂非常恐怖了。python實(shí)現(xiàn)
關(guān)于FFM的工程代碼其實(shí)論文作者已經(jīng)在github上給出C++版本,python版,或者Amazon AI的馬超開源的XLearn。
本文引用Python implementation of FFM model (ctr, cvr)一文,來分析一下FFM的代碼實(shí)現(xiàn)。
import numpy as np import math import random class ffm(object):def __init__(self, feature_num, fild_num, feature_dim_num, feat_fild_dic, learning_rate, regular_para, stop_threshold):#n features, m domains, each feature dimension kself.n = feature_num #特征數(shù)量self.m = fild_num #特征域數(shù)self.k = feature_dim_num #隱向量特征長(zhǎng)度self.dic = feat_fild_dic #特征對(duì)應(yīng)的域#Set hyperparameter, learning rate eta, regularization coefficient lamdaself.eta = learning_rateself.lamda = regular_paraself.threshold = stop_thresholdself.w = np.random.rand(self.n, self.m , self.k) / math.sqrt(self.k)#權(quán)重初試值self.G = np.ones(shape = (feature_num, fild_num, feature_dim_num), dtype = np.float64)def train(self, tr_l, val_l, train_y, val_y, max_echo):#這一部分計(jì)算模型的訓(xùn)練損失# tr_l, val_l, train_y, val_y, max_echo are# Training set, validation set, training set label, validation set label, maximum number of iterationsminloss = 0for i in range(max_echo):# Iterative training, max_echo is the maximum number of iterationsL_val = 0Logloss = 0order = range(len(train_y))# mess up the orderrandom.shuffle(order)for each_data_index in order:# Remove a recordtr_each_data = tr_l[each_data_index]# phi() is the model formulaphi = self.phi(tr_each_data)# y_i is the actual tag valuey_i = float(train_y[each_data_index])# Calculate the gradient belowg_phi = -y_i / (1 + math.exp(y_i * phi))# Begin to update the model parameters using the gradient descent methodself.sgd_para(tr_each_data, g_phi)# Next, check on the verification set, the basic process is the same as before.for each_vadata_index, each_va_y in enumerate(val_y):val_each_data = val_l[each_vadata_index]phi_v = self.phi(val_each_data)y_vai = float(each_va_y)Logloss += -(y_vai * math.log(phi_v) + (1 - y_vai) * math.log(1 - phi_v))Logloss = Logloss / len(val_y)# L_val += math.log(1+math.exp(-y_vai * phi_v))print("The %d iteration, LOGLOSS on the validation set: %f" % (i, Logloss))if minloss == 0:# minloss stores the smallest LOGLOSSminloss = Loglossif Logloss <= self.threshold:# It can also be considered that setting the threshold allows the program to jump, and personal needs can be removed.print('Less than the threshold!')breakif minloss < Logloss:# If the next iteration does not reduce LOGLOSS, break out (early stopping)print('early stopping')breakdef phi(self, tmp_dict):#這一部分計(jì)算這FFM二階部分的對(duì)應(yīng)的值#Samples are normalized here to prevent calculation overflowsum_v = sum(tmp_dict.values())#First find the index of the non-zero feature in each piece of data and put it in a listphi_tmp = 0key_list = tmp_dict.keys()for i in range(len(key_list)):#feat_index is the index of the feature, fild_index1 is the index of the domain, and value1 is the value corresponding to the featurefeat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]#The purpose of dividing here by sum_v is to normalize this one (return all feature values ??to between 0 and 1)#Of course, each feature has been normalized before (0-1)value1 = tmp_dict[feat_index1] / sum_v#Two non-zero features pairwise inner productfor j in range(i+1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]#The final value is obtained by summing up multiple characteristic combinationsphi_tmp += np.dot(w1, w2) * value1 * value2return phi_tmpdef sgd_para(self, tmp_dict, g_phi):#這一部分梯度計(jì)算 ,參數(shù)的更新#學(xué)習(xí)率是用的AdaGrad算法sum_v = sum(tmp_dict.values())key_list = tmp_dict.keys()for i in range(len(key_list)):feat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]value1 = tmp_dict[feat_index1] / sum_vfor j in range(i + 1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]# Update g and Gg_feati_fildj = g_phi * value1 * value2 * w2 + self.lamda * w1g_featj_fildi = g_phi * value1 * value2 * w1 + self.lamda * w2self.G[feat_index1, fild_index2] += g_feati_fildj ** 2self.G[feat_index2, fild_index1] += g_featj_fildi ** 2# math.sqrt() can only accept one element, while np.sqrt() can root the entire vectorself.w[feat_index1, fild_index2] -= self.eta / np.sqrt(self.G[feat_index1, fild_index2]) * g_feati_fildjself.w[feat_index2, fild_index1] -= self.eta / np.sqrt(self.G[feat_index2, fild_index1]) * g_featj_fildi參考資料
- FFM原文
- https://programmersought.com/article/15904469481/
- 美團(tuán)深入FFM原理與實(shí)踐
推薦閱讀
秋雨淅淅l:前深度學(xué)習(xí)時(shí)代--因子分解機(jī)模型FM的因與果。?zhuanlan.zhihu.com總結(jié)
以上是生活随笔為你收集整理的python 主语_前深度学习时代--FFM模型的原理与Python实现的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: qt能使用logback_SpringB
- 下一篇: websocket python爬虫_p