深度因式分解机
深度因式分解機(jī)
Deep Factorization Machines
學(xué)習(xí)有效的特征組合對(duì)于點(diǎn)擊率預(yù)測(cè)任務(wù)的成功至關(guān)重要。因子分解機(jī)以線性范式對(duì)特征交互進(jìn)行建模(例如,雙線性交互)。對(duì)于實(shí)際數(shù)據(jù)來(lái)說(shuō),這通常是不夠的,因?yàn)樵趯?shí)際數(shù)據(jù)中,固有特征交叉結(jié)構(gòu)通常非常復(fù)雜和非線性。更糟糕的是,二階特征交互在實(shí)際的分解機(jī)中通常被使用。從理論上講,用因子分解機(jī)對(duì)高階特征組合進(jìn)行建模是可行的,但由于數(shù)值不穩(wěn)定和計(jì)算復(fù)雜度高,通常不采用這種方法。
一個(gè)有效的解決方案是使用深度神經(jīng)網(wǎng)絡(luò)。深度神經(jīng)網(wǎng)絡(luò)在特征表示學(xué)習(xí)方面具有強(qiáng)大的功能,并且有潛力學(xué)習(xí)復(fù)雜的特征交互。因此,將深度神經(jīng)網(wǎng)絡(luò)集成到因子分解機(jī)中是很自然的。向因子分解機(jī)中添加非線性變換層,使其能夠同時(shí)對(duì)低階特征組合和高階特征組合進(jìn)行建模。此外,輸入的非線性固有結(jié)構(gòu)也可以用深度神經(jīng)網(wǎng)絡(luò)捕捉。在這一部分中,將介紹一個(gè)典型的模型,將FM和deep神經(jīng)網(wǎng)絡(luò)結(jié)合起來(lái),命名為DeepFM(DeepFM)[Guo et al.,2017]。
- Model Architectures
DeepFM由一個(gè)FM組件和一個(gè)deep組件組成,集成在一個(gè)并行結(jié)構(gòu)中。FM組件與用于建模低階特征交互作用的雙向因式分解機(jī)相同。deep組件是一個(gè)多層感知器,用于捕獲高階特征交互和非線性。這兩個(gè)組件共享相同的輸入/嵌入,輸出總結(jié)為最終預(yù)測(cè)。值得一提的是,DeepFM與深度廣度Wide&Deep架構(gòu)的理論相似,既能存儲(chǔ)又能擴(kuò)展。DeepFM相對(duì)于Wide&Deep模型的優(yōu)勢(shì)在于,通過(guò)自動(dòng)識(shí)別特征組合來(lái)減少手工構(gòu)建特征工程的工作量。
值得注意的是,深度調(diào)頻并不僅僅是將深度調(diào)頻與神經(jīng)網(wǎng)絡(luò)相結(jié)合。還可以在特征交互上添加非線性層[He&Chua,2017]。
from d2l import mxnet as d2l
from mxnet import init, gluon, np, npx
from mxnet.gluon import nn
import os
import sys
npx.set_np()
- Implemenation of DeepFM
DeepFM的實(shí)現(xiàn)與FM類似。使用功能塊保持激活功能不變。Dropout也用于正則化模型。MLP的神經(jīng)元數(shù)目可以通過(guò)MLP_dims超參數(shù)進(jìn)行調(diào)整。
class DeepFM(nn.Block):
def __init__(self, field_dims, num_factors, mlp_dims, drop_rate=0.1):super(DeepFM, self).__init__()num_inputs = int(sum(field_dims))self.embedding = nn.Embedding(num_inputs, num_factors)self.fc = nn.Embedding(num_inputs, 1)self.linear_layer = nn.Dense(1, use_bias=True)input_dim = self.embed_output_dim = len(field_dims) * num_factorsself.mlp = nn.Sequential()for dim in mlp_dims:self.mlp.add(nn.Dense(dim, 'relu', True, in_units=input_dim))self.mlp.add(nn.Dropout(rate=drop_rate))input_dim = dimself.mlp.add(nn.Dense(in_units=input_dim, units=1))def forward(self, x):embed_x = self.embedding(x)square_of_sum = np.sum(embed_x, axis=1) ** 2sum_of_square = np.sum(embed_x ** 2, axis=1)inputs = np.reshape(embed_x, (-1, self.embed_output_dim))x = self.linear_layer(self.fc(x).sum(1)) \+ 0.5 * (square_of_sum - sum_of_square).sum(1, keepdims=True) \+ self.mlp(inputs)x = npx.sigmoid(x)return x
- Training and Evaluating the Model
數(shù)據(jù)加載過(guò)程與FM相同。將DeepFM的MLP組件設(shè)置為一個(gè)具有金字塔結(jié)構(gòu)的三層密集網(wǎng)絡(luò)(30-20-10)。所有其超參數(shù)與FM相同。
batch_size = 2048
data_dir = d2l.download_extract(‘ctr’)
train_data = d2l.CTRDataset(os.path.join(data_dir, ‘train.csv’))
test_data = d2l.CTRDataset(os.path.join(data_dir, ‘test.csv’),
feat_mapper=train_data.feat_mapper,defaults=train_data.defaults)
field_dims = train_data.field_dims
num_workers = 0 if sys.platform.startswith(‘win’) else 4
train_iter = gluon.data.DataLoader(train_data, shuffle=True,
last_batch='rollover',batch_size=batch_size,num_workers=num_workers)
test_iter = gluon.data.DataLoader(test_data, shuffle=False,
last_batch='rollover',batch_size=batch_size,num_workers=num_workers)
ctx = d2l.try_all_gpus()
net = DeepFM(field_dims, num_factors=10, mlp_dims=[30, 20, 10])
net.initialize(init.Xavier(), ctx=ctx)
lr, num_epochs, optimizer = 0.01, 30, ‘a(chǎn)dam’
trainer = gluon.Trainer(net.collect_params(), optimizer,
{'learning_rate': lr})
loss = gluon.loss.SigmoidBinaryCrossEntropyLoss()
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, ctx)
loss 0.510, train acc 0.845, test acc 0.860
123302.7 examples/sec on [gpu(0), gpu(1)]
與FM相比,DeepFM收斂速度更快,性能更好。
- Summary
· Integrating neural networks to FM enables it to model complex and high-order interactions.
· DeepFM outperforms the original FM on the advertising dataset.
總結(jié)