svm rbf人脸识别 yale_实操课——机器学习之人脸识别
實驗原理:
支持向量機(support vector machine)是一種分類算法,通過尋求結構化風險最小來提高學習機泛化能力,實現經驗風險和置信范圍的最小化,從而達到在統計樣本量較少的情況下,亦能獲得良好統計規律的目的。通俗來講,它是一種二類分類模型,其基本模型定義為特征空間上的間隔最大的線性分類器,即支持向量機的學習策略便是間隔最大化,最終可轉化為一個凸二次規劃問題的求解。
具體原理:
1. 在n維空間中找到一個分類超平面,將空間上的點分類。如下圖是線性分類的例子。
2. 一般而言,一個點距離超平面的遠近可以表示為分類預測的確信或準確程度。SVM就是要最大化這個間隔值。而在虛線上的點便叫做支持向量Supprot Verctor。
3. 實際中,我們會經常遇到線性不可分的樣例,此時,我們的常用做法是把樣例特征映射到高維空間中去(如下圖);
3. 線性不可分映射到高維空間,可能會導致維度大小高到可怕的(19維乃至無窮維的例子),導致計算復雜。核函數的價值在于它雖然也是講特征進行從低維到高維的轉換,但核函數絕就絕在它事先在低維上進行計算,而將實質上的分類效果表現在了高維上,也就如上文所說的避免了直接在高維空間中的復雜計算。
4.使用松弛變量處理數據噪音
sklearn中SVM的結構,及各個參數說明如下
sklearn.svm.SVC :
view plain?copy
sklearn.svm.SVC(C=1.0,?kernel='rbf',?degree=3,?gamma='auto',?coef0=0.0,?shrinking=True,?probability=False,tol=0.001,?cache_size=200,?class_weight=None,?verbose=False,?max_iter=-1,?decision_function_shape=None,random_state=None)??
參數說明:
view plain?copy
C:C-SVC的懲罰參數C?默認值是1.0??
C越大,相當于懲罰松弛變量,希望松弛變量接近0,即對誤分類的懲罰增大,趨向于對訓練集全分對的情況,這樣對訓練集測試時準確率很高,但泛化能力弱。C值小,對誤分類的懲罰減小,允許容錯,將他們當成噪聲點,泛化能力較強。??
kernel :核函數,默認是rbf,可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’??
?? 0?–?線性:u'v ?
? 1 –?多項式:(gamma*u'*v + coef0)^degree ?
?? 2 – RBF函數:exp(-gamma|u-v|^2)??
?? 3 –sigmoid:tanh(gamma*u'*v + coef0)??
degree :多項式poly函數的維度,默認是3,選擇其他核函數時會被忽略。??
gamma :?‘rbf’,‘poly’?和‘sigmoid’的核函數參數。默認是’auto’,則會選擇1/n_features ?
coef0?:核函數的常數項。對于‘poly’和?‘sigmoid’有用。??
probability :是否采用概率估計?.默認為False ?
shrinking :是否采用shrinking heuristic方法,默認為true??
tol :停止訓練的誤差值大小,默認為1e-3 ?
cache_size :核函數cache緩存大小,默認為200??
class_weight :類別的權重,字典形式傳遞。設置第幾類的參數C為weight*C(C-SVC中的C)??
verbose :允許冗余輸出???
max_iter :最大迭代次數。-1為無限制。??
decision_function_shape :‘ovo’, ‘ovr’ or None, default=None3??
random_state :數據洗牌時的種子值,int值??
主要調節的參數有:C、kernel、degree、gamma、coef0。
系統環境
Linux Ubuntu 16.04
Python3.6
任務內容
用SVM算法對fetch_lfw_people數據進行人臉識別,并將預測結果可視化。
任務步驟
1.創建目錄并下載實驗所需的數據。
view plain?copy
mkdir?-p?/home/zhangyu/scikit_learn_data/lfw_home??
cd?/home/zhangyu/scikit_learn_data/lfw_home??
wget?http://192.168.1.100:60000/allfiles/ma_learn/lfwfunneled.tgz??
wget?http://192.168.1.100:60000/allfiles/ma_learn/pairsDevTest.txt??
wget?http://192.168.1.100:60000/allfiles/ma_learn/pairsDevTrain.txt??
wget?http://192.168.1.100:60000/allfiles/ma_learn/pairs.txt??
tar?xzvf?lfwfunneled.tgz??
2.新建Python project ,名為python15.
在python15項目下,新建Python file,名為SVM
3.用SVM算法對fetch_lfw_people數據進行人臉識別,并將預測結果可視化,完整代碼如下:
view plain?copy
from?__future__?import?print_function??
from?time?import?time??
import?logging??
import?matplotlib.pyplot?as?plt??
from?sklearn.model_selection?import?train_test_split??
from?sklearn.datasets?import?fetch_lfw_people??
from?sklearn.model_selection?import?GridSearchCV??
from?sklearn.metrics?import?classification_report??
from?sklearn.metrics?import?confusion_matrix??
from?sklearn.decomposition?import?PCA??
from?sklearn.svm?import?SVC??
#?Display?progress?logs?on?stdout??
logging.basicConfig(level=logging.INFO,?format='%(asctime)s?%(message)s')??
###############################################################################??
#?Download?the?data,?if?not?already?on?disk?and?load?it?as?numpy?arrays??
lfw_people?=?fetch_lfw_people(min_faces_per_person=70,?resize=0.4)??
#?introspect?the?images?arrays?to?find?the?shapes?(for?plotting)??
n_samples,?h,?w?=?lfw_people.images.shape??
#?for?machine?learning?we?use?the?2?data?directly?(as?relative?pixel??
#?positions?info?is?ignored?by?this?model)??
X?=?lfw_people.data??
n_features?=?X.shape[1]??
#?the?label?to?predict?is?the?id?of?the?person??
y?=?lfw_people.target??
target_names?=?lfw_people.target_names??
n_classes?=?target_names.shape[0]??
print("Total?dataset?size:")??
print("n_samples:?%d"?%?n_samples)??
print("n_features:?%d"?%?n_features)??
print("n_classes:?%d"?%?n_classes)??
###############################################################################??
#?Split?into?a?training?set?and?a?test?set?using?a?stratified?k?fold??
#?split?into?a?training?and?testing?set??
X_train,?X_test,?y_train,?y_test?=?train_test_split(??
????X,?y,?test_size=0.25)??
###############################################################################??
#?Compute?a?PCA?(eigenfaces)?on?the?face?dataset?(treated?as?unlabeled??
#?dataset):?unsupervised?feature?extraction?/?dimensionality?reduction??
n_components?=?150??
print("Extracting?the?top?%d?eigenfaces?from?%d?faces"??
??????%?(n_components,?X_train.shape[0]))??
t0?=?time()??
pca?=?PCA(svd_solver='randomized',n_components=n_components,?whiten=True).fit(X_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
eigenfaces?=?pca.components_.reshape((n_components,?h,?w))??
print("Projecting?the?input?data?on?the?eigenfaces?orthonormal?basis")??
t0?=?time()??
X_train_pca?=?pca.transform(X_train)??
X_test_pca?=?pca.transform(X_test)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
###############################################################################??
#?Train?a?SVM?classification?model??
print("Fitting?the?classifier?to?the?training?set")??
t0?=?time()??
param_grid?=?{'C':?[1e3,?5e3,?1e4,?5e4,?1e5],??
??????????????'gamma':?[0.0001,?0.0005,?0.001,?0.005,?0.01,?0.1],?}??
clf?=?GridSearchCV(SVC(kernel='rbf',?class_weight='balanced'),?param_grid)??
clf?=?clf.fit(X_train_pca,?y_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print("Best?estimator?found?by?grid?search:")??
print(clf.best_estimator_)??
###############################################################################??
#?Quantitative?evaluation?of?the?model?quality?on?the?test?set??
print("Predicting?people's?names?on?the?test?set")??
t0?=?time()??
y_pred?=?clf.predict(X_test_pca)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print(classification_report(y_test,?y_pred,?target_names=target_names))??
print(confusion_matrix(y_test,?y_pred,?labels=range(n_classes)))??
###############################################################################??
#?Qualitative?evaluation?of?the?predictions?using?matplotlib??
def?plot_gallery(images,?titles,?h,?w,?n_row=3,?n_col=4):??
????"""Helper?function?to?plot?a?gallery?of?portraits"""??
????plt.figure(figsize=(1.8?*?n_col,?2.4?*?n_row))??
????plt.subplots_adjust(bottom=0,?left=.01,?right=.99,?top=.90,?hspace=.35)??
????for?i?in?range(n_row?*?n_col):??
????????plt.subplot(n_row,?n_col,?i?+?1)??
????????plt.imshow(images[i].reshape((h,?w)),?cmap=plt.cm.gray)??
????????plt.title(titles[i],?size=12)??
????????plt.xticks(())??
????????plt.yticks(())??
#?plot?the?result?of?the?prediction?on?a?portion?of?the?test?set??
def?title(y_pred,?y_test,?target_names,?i):??
????pred_name?=?target_names[y_pred[i]].rsplit('?',?1)[-1]??
????true_name?=?target_names[y_test[i]].rsplit('?',?1)[-1]??
????return?'predicted:?%s\ntrue:??????%s'?%?(pred_name,?true_name)??
prediction_titles?=?[title(y_pred,?y_test,?target_names,?i)??
?????????????????????for?i?in?range(y_pred.shape[0])]??
plot_gallery(X_test,?prediction_titles,?h,?w)??
#?plot?the?gallery?of?the?most?significative?eigenfaces??
eigenface_titles?=?["eigenface?%d"?%?i?for?i?in?range(eigenfaces.shape[0])]??
plot_gallery(eigenfaces,?eigenface_titles,?h,?w)??
plt.show()??
4.對完整代碼進行分部描述,用import導入實驗所用到的包
view plain?copy
from?__future__?import?print_function??
from?time?import?time??
import?logging??
import?matplotlib.pyplot?as?plt??
from?sklearn.model_selection?import?train_test_split??
from?sklearn.datasets?import?fetch_lfw_people??
from?sklearn.model_selection?import?GridSearchCV??
from?sklearn.metrics?import?classification_report??
from?sklearn.metrics?import?confusion_matrix??
from?sklearn.decomposition?import?PCA??
from?sklearn.svm?import?SVC??
5.提取數據
view plain?copy
lfw_people?=?fetch_lfw_people(min_faces_per_person=70,?resize=0.4)??
#?introspect?the?images?arrays?to?find?the?shapes?(for?plotting)??
n_samples,?h,?w?=?lfw_people.images.shape??
#?for?machine?learning?we?use?the?2?data?directly?(as?relative?pixel??
#?positions?info?is?ignored?by?this?model)??
X?=?lfw_people.data??
n_features?=?X.shape[1]??
#?the?label?to?predict?is?the?id?of?the?person??
y?=?lfw_people.target??
target_names?=?lfw_people.target_names??
n_classes?=?target_names.shape[0]??
print("Total?dataset?size:")??
print("n_samples:?%d"?%?n_samples)??
print("n_features:?%d"?%?n_features)??
print("n_classes:?%d"?%?n_classes)??
運行結果:
6.特征提取
view plain?copy
X_train,?X_test,?y_train,?y_test?=?train_test_split(??
????X,?y,?test_size=0.25)??
###############################################################################??
#?Compute?a?PCA?(eigenfaces)?on?the?face?dataset?(treated?as?unlabeled??
#?dataset):?unsupervised?feature?extraction?/?dimensionality?reduction??
n_components?=?150??
print("Extracting?the?top?%d?eigenfaces?from?%d?faces"??
??????%?(n_components,?X_train.shape[0]))??
t0?=?time()??
pca?=?PCA(svd_solver='randomized',n_components=n_components,?whiten=True).fit(X_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
eigenfaces?=?pca.components_.reshape((n_components,?h,?w))??
print("Projecting?the?input?data?on?the?eigenfaces?orthonormal?basis")??
t0?=?time()??
X_train_pca?=?pca.transform(X_train)??
X_test_pca?=?pca.transform(X_test)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
運行結果:
7.建立SVM分類模型
view plain?copy
print("Fitting?the?classifier?to?the?training?set")??
t0?=?time()??
param_grid?=?{'C':?[1e3,?5e3,?1e4,?5e4,?1e5],??
??????????????'gamma':?[0.0001,?0.0005,?0.001,?0.005,?0.01,?0.1],?}??
clf?=?GridSearchCV(SVC(kernel='rbf',?class_weight='balanced'),?param_grid)??
clf?=?clf.fit(X_train_pca,?y_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print("Best?estimator?found?by?grid?search:")??
print(clf.best_estimator_)??
運行結果:
8.模型評估
view plain?copy
print("Predicting?people's?names?on?the?test?set")??
t0?=?time()??
y_pred?=?clf.predict(X_test_pca)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print(classification_report(y_test,?y_pred,?target_names=target_names))??
print(confusion_matrix(y_test,?y_pred,?labels=range(n_classes)))??
運行結果:
9.預測結果可視化
view plain?copy
def?plot_gallery(images,?titles,?h,?w,?n_row=3,?n_col=4):??
????"""Helper?function?to?plot?a?gallery?of?portraits"""??
????plt.figure(figsize=(1.8?*?n_col,?2.4?*?n_row))??
????plt.subplots_adjust(bottom=0,?left=.01,?right=.99,?top=.90,?hspace=.35)??
????for?i?in?range(n_row?*?n_col):??
????????plt.subplot(n_row,?n_col,?i?+?1)??
????????plt.imshow(images[i].reshape((h,?w)),?cmap=plt.cm.gray)??
????????plt.title(titles[i],?size=12)??
????????plt.xticks(())??
????????plt.yticks(())??
#?plot?the?result?of?the?prediction?on?a?portion?of?the?test?set??
def?title(y_pred,?y_test,?target_names,?i):??
????pred_name?=?target_names[y_pred[i]].rsplit('?',?1)[-1]??
????true_name?=?target_names[y_test[i]].rsplit('?',?1)[-1]??
????return?'predicted:?%s\ntrue:??????%s'?%?(pred_name,?true_name)??
prediction_titles?=?[title(y_pred,?y_test,?target_names,?i)??
?????????????????????for?i?in?range(y_pred.shape[0])]??
plot_gallery(X_test,?prediction_titles,?h,?w)??
#?plot?the?gallery?of?the?most?significative?eigenfaces??
eigenface_titles?=?["eigenface?%d"?%?i?for?i?in?range(eigenfaces.shape[0])]??
plot_gallery(eigenfaces,?eigenface_titles,?h,?w)??
plt.show()??
運行結果:
eigenface:
總結
以上是生活随笔為你收集整理的svm rbf人脸识别 yale_实操课——机器学习之人脸识别的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 上海欢乐谷买了门票进去玩还要钱吗
- 下一篇: mysql数据库验证登陆不上_MySQL