當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python 主语_前深度学习时代--FFM模型的原理与Python实现

發(fā)布時(shí)間：2024/10/5 python 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 主语_前深度学习时代--FFM模型的原理与Python实现小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

基于上一篇分析中協(xié)同過濾、邏輯回歸及FM的比較，可以得出這樣一個(gè)結(jié)論：

主流模型迭代的關(guān)鍵在于增強(qiáng)模型表達(dá)能力，而增強(qiáng)方式的主要脈絡(luò)為：

引入其它可用特征信息（CF->LR）。

將現(xiàn)有特征進(jìn)行組合（LR->POLY2->FM）。

更通俗的表達(dá)：

模型迭代在于找到更多的有效信息。

本文想要回顧的FFM（Field-aware Factorization Macheines）模型可以看作是FM模型的增強(qiáng)版，其正是沿用在FM模型的特征組合思想，并將其發(fā)揚(yáng)光大，曾在多項(xiàng)CTR預(yù)估賽中奪魁，并且被Criteo、美團(tuán)等公司深度應(yīng)用在推薦系統(tǒng)與CTR預(yù)估等領(lǐng)域。

相較于FM模型，FFM模型在FM隱向量特征交叉組合的基礎(chǔ)上，進(jìn)一步引入了特征感知（field-aware）這一概念，使得模型表達(dá)能力在理論上有了一個(gè)較大的提升。

FM模型表達(dá)式：

FFM的表達(dá)式如下所示：

從表達(dá)式中可以看出，FFM與FM的不同之處在于二階特征組合部分的隱向量由

變成了。而這便意味著FFM模型中每一維特征對(duì)應(yīng)的不是唯一一個(gè)隱向量，而是一組特征隱向量，這也具體的引出了FFM與FM特征組合不同之處與FFM模型的提升之處。

FFM這么做的原因是什么？

FM模型中每一特征共用同一個(gè)特征隱向量，意味著每一特征與不同域特征進(jìn)行組合時(shí)使用的是同一隱特征向量來學(xué)習(xí)隱向量參數(shù)，這樣不夠細(xì)致，存在明顯的信息浪費(fèi)。

什么是特征域Fileds，FFM怎么去做特征域級(jí)的特征組合？

參考論文原文，“features” can be grouped into “fields” 。把這句話的主語和賓語置換一下就可以得到Fileds的定義： “fields” is grouped by "features"，也即特征域由某類特征組成。

借用論文的示例，或許可以更直觀的回答這個(gè)問題：

Table 1: An artificial CTR data set, where + (-) representsthe number of clicked (unclicked) impressions.

Table 2:a example of click event

在上面兩個(gè)表中，Publisher(P)、Advertiser(A)和Gander(G)是三個(gè)Fileds特征域。其中Publisher特征域的中的feature有ESPN、Vogue和NBC，套用作者定義那么有：fields of Publisher is grouped by features<ESPN,Vogue,NBC>。Advertiser與Gander特征域同理。

那么對(duì)于Table 2的數(shù)據(jù)，FFM的二階特征組合結(jié)果為：

而對(duì)于FM二階特征組合結(jié)果而言：

可以看到在FFM模型中，Feature ESPN在與NIKE及Male形成特征組合ESPN,NIKE)、(ESPN,Male)時(shí)，使用了不同的潛向量

及來學(xué)習(xí)參數(shù)。同理，Feature NIKE在與ESPN及Male特征組合時(shí)，也分別使用了兩個(gè)不同的潛向量及來計(jì)算組合權(quán)重。

而對(duì)于FM而言，Feature ESPN在與NIKE及Male特征組合時(shí)，使用的是同一個(gè)潛向量

，同時(shí)NIKE與Male特征也只存在唯一一個(gè)隱向量。

至此，什么是特征域Fileds，FFM怎么去做特征域級(jí)的特征組合得到解釋。

一個(gè)問題

以上解釋了FFM模型相較于FM模型的優(yōu)勢(shì)在于，每個(gè)特征在與不同F(xiàn)ield下的Feature進(jìn)行組合時(shí)，會(huì)使用與之特征域?qū)?yīng)的隱向量來學(xué)習(xí)組合權(quán)重。而這也帶來了一個(gè)FFM不被大規(guī)模應(yīng)用的問題--參數(shù)量暴增，解釋如下：

假設(shè)一份數(shù)據(jù)集dataset的維度為(m,n)，FM在進(jìn)行模型訓(xùn)練時(shí)將隱向量特征維度設(shè)置為k維，那么FM的特征量便為nk。而對(duì)于FFM而言，假設(shè)dataset的n維特征對(duì)應(yīng)著f個(gè)特征域，那么在隱向量維度同維k的情況下，FFM的特征量級(jí)為nfk（實(shí)際數(shù)量為n(f-1)k，由于特征不需要自我交叉，因此為f-1）。

nk與nfk之間的差別雖然只是線性級(jí)，但是由于互聯(lián)網(wǎng)數(shù)據(jù)特征量n動(dòng)輒百萬級(jí)，雖然f的值會(huì)比n小若干個(gè)數(shù)量級(jí)，但這也足以使得FFM模型的參數(shù)量暴增到一個(gè)恐怖的級(jí)別。

Table 3:two CTR datasets Criteo and Avazu from Kaggle competitions

以Table3數(shù)據(jù)集Criteo為例，當(dāng)維度k取為10時(shí)，FM的參數(shù)量為

，而FFM的量級(jí)為。雖然只增加了一個(gè)數(shù)量級(jí)別，但參數(shù)量從千萬級(jí)變?yōu)榱藘|級(jí)別，可謂非常恐怖了。

python實(shí)現(xiàn)

關(guān)于FFM的工程代碼其實(shí)論文作者已經(jīng)在github上給出C++版本，python版，或者Amazon AI的馬超開源的XLearn。

本文引用Python implementation of FFM model (ctr, cvr)一文，來分析一下FFM的代碼實(shí)現(xiàn)。

import numpy as np import math import random class ffm(object):def __init__(self, feature_num, fild_num, feature_dim_num, feat_fild_dic, learning_rate, regular_para, stop_threshold):#n features, m domains, each feature dimension kself.n = feature_num #特征數(shù)量self.m = fild_num #特征域數(shù)self.k = feature_dim_num #隱向量特征長(zhǎng)度self.dic = feat_fild_dic #特征對(duì)應(yīng)的域#Set hyperparameter, learning rate eta, regularization coefficient lamdaself.eta = learning_rateself.lamda = regular_paraself.threshold = stop_thresholdself.w = np.random.rand(self.n, self.m , self.k) / math.sqrt(self.k)#權(quán)重初試值self.G = np.ones(shape = (feature_num, fild_num, feature_dim_num), dtype = np.float64)def train(self, tr_l, val_l, train_y, val_y, max_echo):#這一部分計(jì)算模型的訓(xùn)練損失# tr_l, val_l, train_y, val_y, max_echo are# Training set, validation set, training set label, validation set label, maximum number of iterationsminloss = 0for i in range(max_echo):# Iterative training, max_echo is the maximum number of iterationsL_val = 0Logloss = 0order = range(len(train_y))# mess up the orderrandom.shuffle(order)for each_data_index in order:# Remove a recordtr_each_data = tr_l[each_data_index]# phi() is the model formulaphi = self.phi(tr_each_data)# y_i is the actual tag valuey_i = float(train_y[each_data_index])# Calculate the gradient belowg_phi = -y_i / (1 + math.exp(y_i * phi))# Begin to update the model parameters using the gradient descent methodself.sgd_para(tr_each_data, g_phi)# Next, check on the verification set, the basic process is the same as before.for each_vadata_index, each_va_y in enumerate(val_y):val_each_data = val_l[each_vadata_index]phi_v = self.phi(val_each_data)y_vai = float(each_va_y)Logloss += -(y_vai * math.log(phi_v) + (1 - y_vai) * math.log(1 - phi_v))Logloss = Logloss / len(val_y)# L_val += math.log(1+math.exp(-y_vai * phi_v))print("The %d iteration, LOGLOSS on the validation set: %f" % (i, Logloss))if minloss == 0:# minloss stores the smallest LOGLOSSminloss = Loglossif Logloss <= self.threshold:# It can also be considered that setting the threshold allows the program to jump, and personal needs can be removed.print('Less than the threshold!')breakif minloss < Logloss:# If the next iteration does not reduce LOGLOSS, break out (early stopping)print('early stopping')breakdef phi(self, tmp_dict):#這一部分計(jì)算這FFM二階部分的對(duì)應(yīng)的值#Samples are normalized here to prevent calculation overflowsum_v = sum(tmp_dict.values())#First find the index of the non-zero feature in each piece of data and put it in a listphi_tmp = 0key_list = tmp_dict.keys()for i in range(len(key_list)):#feat_index is the index of the feature, fild_index1 is the index of the domain, and value1 is the value corresponding to the featurefeat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]#The purpose of dividing here by sum_v is to normalize this one (return all feature values ??to between 0 and 1)#Of course, each feature has been normalized before (0-1)value1 = tmp_dict[feat_index1] / sum_v#Two non-zero features pairwise inner productfor j in range(i+1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]#The final value is obtained by summing up multiple characteristic combinationsphi_tmp += np.dot(w1, w2) * value1 * value2return phi_tmpdef sgd_para(self, tmp_dict, g_phi):#這一部分梯度計(jì)算，參數(shù)的更新#學(xué)習(xí)率是用的AdaGrad算法sum_v = sum(tmp_dict.values())key_list = tmp_dict.keys()for i in range(len(key_list)):feat_index1 = key_list[i]fild_index1 = self.dic[feat_index1]value1 = tmp_dict[feat_index1] / sum_vfor j in range(i + 1, len(key_list)):feat_index2 = key_list[j]fild_index2 = self.dic[feat_index2]value2 = tmp_dict[feat_index2] / sum_vw1 = self.w[feat_index1, fild_index2]w2 = self.w[feat_index2, fild_index1]# Update g and Gg_feati_fildj = g_phi * value1 * value2 * w2 + self.lamda * w1g_featj_fildi = g_phi * value1 * value2 * w1 + self.lamda * w2self.G[feat_index1, fild_index2] += g_feati_fildj ** 2self.G[feat_index2, fild_index1] += g_featj_fildi ** 2# math.sqrt() can only accept one element, while np.sqrt() can root the entire vectorself.w[feat_index1, fild_index2] -= self.eta / np.sqrt(self.G[feat_index1, fild_index2]) * g_feati_fildjself.w[feat_index2, fild_index1] -= self.eta / np.sqrt(self.G[feat_index2, fild_index1]) * g_featj_fildi

參考資料

FFM原文
https://programmersought.com/article/15904469481/
美團(tuán)深入FFM原理與實(shí)踐

總結(jié)

以上是生活随笔為你收集整理的python 主语_前深度学习时代--FFM模型的原理与Python实现的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： qt能使用logback_SpringB
下一篇： websocket python爬虫_p