基于mxnet的Regression问题Kaggle比赛代码框架
生活随笔
收集整理的這篇文章主要介紹了
基于mxnet的Regression问题Kaggle比赛代码框架
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、概述
書中3.16節擴展一下可以作為kaggle比賽的框架,這個賽題的名字是House Prices: Advanced Regression Techniques,是一個Regression問題。
二、Deeplearning的一般流程
結合李航《統計學習方法》中對機器學習流程的總結,分為data、model、strategy、algorithm、training、prediction
1、 Data
1.1、read data
# read data train_data = pd.read_csv('./d2l-zh-1.1/data/kaggle_house_pred_train.csv') test_data = pd.read_csv('./d2l-zh-1.1/data/kaggle_house_pred_test.csv') # print(train_data.shape) # print(train_data.iloc[0:4, [0, 1, 2, -1, -2, -3]])1.2、preprocess data
# standardization to numeric type all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:])) numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / x.std()) # 標準化后,每個特征的均值變為0,所以可以直接用0來替換缺失值 all_features[numeric_features] = all_features[numeric_features].fillna(0)# convert discrete value to dummy variable all_features = pd.get_dummies(all_features, dummy_na=True)# get train and test data n_train = train_data.shape[0] train_features = nd.array(all_features[:n_train].values) test_features = nd.array(all_features[n_train:].values) train_labels = nd.array(train_data['SalePrice'].values).reshape((-1, 1))1.3、get_k_fold_data
# k folds validation def get_k_fold_data(k, i, X, y):assert k > 1fold_size = X.shape[0] // kX_train, y_train, X_valid, y_valid = None, None, None, Nonefor j in range(k):idx = slice(j * fold_size, (j + 1) * fold_size)X_part, y_part = X[idx, :], y[idx]if j == i:X_valid, y_valid = X_part, y_partelif X_train is None:X_train, y_train = X_part, y_partelse:X_train = nd.concat(X_train, X_part, dim=0)y_train = nd.concat(y_train, y_part, dim=0)return X_train, y_train, X_valid, y_valid2、Model
def get_net():net = nn.Sequential()net.add(nn.Dense(256, activation='relu'),nn.Dropout(0.5),nn.Dense(1))net.initialize()return net3、Strategy
loss = gloss.L2Loss()4、Algorithm
# loss = gloss.L2Loss()5、Training
# training def train(net, train_iter, train_features, train_labels, test_features, test_labels,loss, num_epochs, trainer, batch_size):train_ls, test_ls = [], []for epoch in range(num_epochs):for X, y in train_iter:with autograd.record():l = loss(net(X), y)l.backward()trainer.step(batch_size)train_ls.append(log_rmse(net, train_features, train_labels))if test_labels is not None:test_ls.append(log_rmse(net, test_features, test_labels))return train_ls, test_ls6、Validation
def k_fold(k, X_train, y_train, num_epochs, learning_rate, weight_decay, batch_size):train_l_sum, valid_l_sum = 0.0, 0.0for i in range(k):# datadata = get_k_fold_data(k, i, X_train, y_train)train_features, train_labels, _, _ = datatrain_iter = gdata.DataLoader(gdata.ArrayDataset(train_features, train_labels), batch_size, shuffle=True)# modelnet = get_net()# strategyloss = gloss.L2Loss()# algorithmtrainer = gluon.Trainer(net.collect_params(), 'adam',{'learning_rate': learning_rate, 'wd': weight_decay})# trainingtrain_ls, valid_ls = train(net, train_iter, *data, loss, num_epochs, trainer, batch_size)train_l_sum += train_ls[-1]valid_l_sum += valid_ls[-1]if i == 0:d2l.semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'rmse',range(1, num_epochs + 1), valid_ls, ['train', 'valid'])print('fold %d, train rmse %f, valid rmse %f' % (i, train_ls[-1], valid_ls[-1]))return train_l_sum / k, valid_l_sum / k# model selection k, num_epochs, lr, weight_decay, batch_size = 5, 500, 0.01, 512, 64 train_l, valid_l = k_fold(k, train_features, train_labels, num_epochs, lr,weight_decay, batch_size) print('%d-fold validation: avg train rmse %f, avg valid rmse %f' % (k, train_l, valid_l))7、Prediction
train_and_pred(train_features, test_features, train_labels, test_data,num_epochs, lr, weight_decay, batch_size)總結
以上是生活随笔為你收集整理的基于mxnet的Regression问题Kaggle比赛代码框架的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: php strstr 效率,PHP中的s
- 下一篇: oracle 导入excel时间格式,将