當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

3.Your First Machine Learning Model

發布時間：2023/12/10 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 3.Your First Machine Learning Model 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Selecting Data for Modeling

你的數據集有太多的變量包裹住你的頭。你怎么能把這些壓倒性的數據削減到你能理解的東西？
我們首先使用我們的直覺選擇一些變量。后面的課程將向您展示自動確定變量優先級的統計技巧。
要選擇變量/列，我們需要查看數據集中所有列。這是通過DataFrame的columns屬性（下面的代碼）完成的。

[1]

import pandas as pdmelbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv' melbourne_data = pd.read_csv(melbourne_file_path) melbourne_data.columns Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG','Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car','Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude','Longtitude', 'Regionname', 'Propertycount'],dtype='object')

[2]

# The Melbourne data has some missing values (some houses for which some variables weren't recorded.) # We'll learn to handle missing values in a later tutorial. # Your Iowa data doesn't have missing values in the columns you use. # So we will take the simplest option for now, and drop houses from our data. # Don't worry about this much for now, though the code is:# dropna drops missing values (think of na as "not available") melbourne_data = melbourne_data.dropna(axis=0)

有很多方法可以選擇數據的子集。 pandas課程更深入地介紹了這些內容，但我們現在將重點關注兩種方法。

???? 點符號，我們用它來選擇“預測目標”

???? 選擇列表，我們用它來選擇

Selecting The Prediction Target

您可以使用點符號來提取變量。這一列存儲在一個Series中，它大致類似于只有一列數據的DataFrame。
我們將使用點符號來選擇我們想要預測的列，這稱為預測目標。按照慣例，預測目標稱為y。因此，我們需要在墨爾本數據中保存房價的代碼是

[3]

y = melbourne_data.Price

Choosing "Features"

我們模型中的列（后來用于預測）被稱為“特征”。在我們的例子中，那些將是用于確定房價的列。有時，您將使用除目標之外的所有列作為要素。其他時候你用更少的功能會更好。
目前，我們將構建一個只有少數特征的模型。稍后您將看到如何迭代和比較使用不同特征構建的模型。
我們通過在括號內提供列表名來選擇多個特征。該列表中的每個項目都應該是一個字符串（帶引號）。
這是一個例子：

【4】

melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']

按照慣例，這個數據稱為X.

【5】

X = melbourne_data[melbourne_features]

讓我們使用describe方法和head方法快速查看我們將用于預測房價的數據，該方法顯示前幾行。

【6】

X.describe() ?RoomsBathroomLandsizeLattitudeLongtitudecountmeanstdmin25%50%75%max

6196.000000	6196.000000	6196.000000	6196.000000	6196.000000
2.931407	1.576340	471.006940	-37.807904	144.990201
0.971079	0.711362	897.449881	0.075850	0.099165
1.000000	1.000000	0.000000	-38.164920	144.542370
2.000000	1.000000	152.000000	-37.855438	144.926198
3.000000	1.000000	373.000000	-37.802250	144.995800
4.000000	2.000000	628.000000	-37.758200	145.052700
8.000000	8.000000	37000.000000	-37.457090	145.526350

[7]

X.head() ?RoomsBathroomLandsizeLattitudeLongtitude12467

2	1.0	156.0	-37.8079	144.9934
3	2.0	134.0	-37.8093	144.9944
4	1.0	120.0	-37.8072	144.9941
3	2.0	245.0	-37.8024	144.9993
2	1.0	256.0	-37.8060	144.9954

使用這些命令直觀地檢查數據是數據科學家工作的重要組成部分。您經常會在數據集中發現值得進一步檢查的驚喜。

Building Your Model

您將使用scikit-learn庫來創建模型。編碼時，此庫編寫為sklearn，您將在示例代碼中看到。 Scikit-learn是最常用的庫，用于對通常存儲在DataFrame中的數據類型進行建模。

構建和使用模型的步驟如下：
???? 定義：它將是什么類型的模型？決策樹？其他一些模型？還指定了模型類型的一些其他參數。
? ? ?擬合：從提供的數據中捕獲模式，這是建模的核心。
???? 預測：聽起來是什么樣的
???? 評估：確定模型預測的準確程度。

下面是使用scikit-learn定義決策樹模型并將其與特征和目標變量擬合的示例。

【8】

from sklearn.tree import DecisionTreeRegressor# Define model. Specify a number for random_state to ensure same results each run melbourne_model = DecisionTreeRegressor(random_state=1)# Fit model melbourne_model.fit(X, y) DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,max_leaf_nodes=None, min_impurity_decrease=0.0,min_impurity_split=None, min_samples_leaf=1,min_samples_split=2, min_weight_fraction_leaf=0.0,presort=False, random_state=1, splitter='best')

許多機器學習模型允許模型訓練中的一些隨機性。為random_state指定一個數字可確保您在每次運行中獲得相同的結果。這被認為是一種很好的做法。您使用任何數字，模型質量不會取決于您選擇的確切值。

我們現在有一個可以用來進行預測的擬合模型。

在實踐中，你會想要對市場上的新房子進行預測，而不是對我們已經有價格的房屋進行預測。但是我們將對訓練數據的前幾行進行預測，以了解預測函數的工作原理。

【9】

print("Making predictions for the following 5 houses:") print(X.head()) print("The predictions are") print(melbourne_model.predict(X.head())) Making predictions for the following 5 houses:Rooms Bathroom Landsize Lattitude Longtitude 1 2 1.0 156.0 -37.8079 144.9934 2 3 2.0 134.0 -37.8093 144.9944 4 4 1.0 120.0 -37.8072 144.9941 6 3 2.0 245.0 -37.8024 144.9993 7 2 1.0 256.0 -37.8060 144.9954 The predictions are [1035000. 1465000. 1600000. 1876000. 1636000.]

Your Turn

嘗試進行模型建立練習

總結

以上是生活随笔為你收集整理的3.Your First Machine Learning Model的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：真我GT2大师探索版将至：国际潮流设计师
下一篇：常用工具整理：数学，论文，代码等