支持向量机python代码实现
生活随笔
收集整理的這篇文章主要介紹了
支持向量机python代码实现
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
訓練數據
-0.214824 0.662756 -1.000000 -0.061569 -0.091875 1.000000 0.406933 0.648055 -1.000000 0.223650 0.130142 1.000000 0.231317 0.766906 -1.000000 -0.748800 -0.531637 -1.000000 -0.557789 0.375797 -1.000000 0.207123 -0.019463 1.000000 0.286462 0.719470 -1.000000 0.195300 -0.179039 1.000000 -0.152696 -0.153030 1.000000 0.384471 0.653336 -1.000000 -0.117280 -0.153217 1.000000 -0.238076 0.000583 1.000000 -0.413576 0.145681 1.000000 0.490767 -0.680029 -1.000000 0.199894 -0.199381 1.000000 -0.356048 0.537960 -1.000000 -0.392868 -0.125261 1.000000 0.353588 -0.070617 1.000000 0.020984 0.925720 -1.000000 -0.475167 -0.346247 -1.000000 0.074952 0.042783 1.000000 0.394164 -0.058217 1.000000 0.663418 0.436525 -1.000000 0.402158 0.577744 -1.000000 -0.449349 -0.038074 1.000000 0.619080 -0.088188 -1.000000 0.268066 -0.071621 1.000000 -0.015165 0.359326 1.000000 0.539368 -0.374972 -1.000000 -0.319153 0.629673 -1.000000 0.694424 0.641180 -1.000000 0.079522 0.193198 1.000000 0.253289 -0.285861 1.000000 -0.035558 -0.010086 1.000000 -0.403483 0.474466 -1.000000 -0.034312 0.995685 -1.000000 -0.590657 0.438051 -1.000000 -0.098871 -0.023953 1.000000 -0.250001 0.141621 1.000000 -0.012998 0.525985 -1.000000 0.153738 0.491531 -1.000000 0.388215 -0.656567 -1.000000 0.049008 0.013499 1.000000 0.068286 0.392741 1.000000 0.747800 -0.066630 -1.000000 0.004621 -0.042932 1.000000 -0.701600 0.190983 -1.000000 0.055413 -0.024380 1.000000 0.035398 -0.333682 1.000000 0.211795 0.024689 1.000000 -0.045677 0.172907 1.000000 0.595222 0.209570 -1.000000 0.229465 0.250409 1.000000 -0.089293 0.068198 1.000000 0.384300 -0.176570 1.000000 0.834912 -0.110321 -1.000000 -0.307768 0.503038 -1.000000 -0.777063 -0.348066 -1.000000 0.017390 0.152441 1.000000 -0.293382 -0.139778 1.000000 -0.203272 0.286855 1.000000 0.957812 -0.152444 -1.000000 0.004609 -0.070617 1.000000 -0.755431 0.096711 -1.000000 -0.526487 0.547282 -1.000000 -0.246873 0.833713 -1.000000 0.185639 -0.066162 1.000000 0.851934 0.456603 -1.000000 -0.827912 0.117122 -1.000000 0.233512 -0.106274 1.000000 0.583671 -0.709033 -1.000000 -0.487023 0.625140 -1.000000 -0.448939 0.176725 1.000000 0.155907 -0.166371 1.000000 0.334204 0.381237 -1.000000 0.081536 -0.106212 1.000000 0.227222 0.527437 -1.000000 0.759290 0.330720 -1.000000 0.204177 -0.023516 1.000000 0.577939 0.403784 -1.000000 -0.568534 0.442948 -1.000000 -0.011520 0.021165 1.000000 0.875720 0.422476 -1.000000 0.297885 -0.632874 -1.000000 -0.015821 0.031226 1.000000 0.541359 -0.205969 -1.000000 -0.689946 -0.508674 -1.000000 -0.343049 0.841653 -1.000000 0.523902 -0.436156 -1.000000 0.249281 -0.711840 -1.000000 0.193449 0.574598 -1.000000 -0.257542 -0.753885 -1.000000 -0.021605 0.158080 1.000000 0.601559 -0.727041 -1.000000 -0.791603 0.095651 -1.000000 -0.908298 -0.053376 -1.000000 0.122020 0.850966 -1.000000 -0.725568 -0.292022 -1.000000測試數據
3.542485 1.977398 -1 3.018896 2.556416 -1 7.551510 -1.580030 1 2.114999 -0.004466 -1 8.127113 1.274372 1 7.108772 -0.986906 1 8.610639 2.046708 1 2.326297 0.265213 -1 3.634009 1.730537 -1 0.341367 -0.894998 -1 3.125951 0.293251 -1 2.123252 -0.783563 -1 0.887835 -2.797792 -1 7.139979 -2.329896 1 1.696414 -1.212496 -1 8.117032 0.623493 1 8.497162 -0.266649 1 4.658191 3.507396 -1 8.197181 1.545132 1 1.208047 0.213100 -1 1.928486 -0.321870 -1 2.175808 -0.014527 -1 7.886608 0.461755 1 3.223038 -0.552392 -1 3.628502 2.190585 -1 7.407860 -0.121961 1 7.286357 0.251077 1 2.301095 -0.533988 -1 -0.232542 -0.547690 -1 3.457096 -0.082216 -1 3.023938 -0.057392 -1 8.015003 0.885325 1 8.991748 0.923154 1 7.916831 -1.781735 1 7.616862 -0.217958 1 2.450939 0.744967 -1 7.270337 -2.507834 1 1.749721 -0.961902 -1 1.803111 -0.176349 -1 8.804461 3.044301 1 1.231257 -0.568573 -1 2.074915 1.410550 -1 -0.743036 -1.736103 -1 3.536555 3.964960 -1 8.410143 0.025606 1 7.382988 -0.478764 1 6.960661 -0.245353 1 8.234460 0.701868 1 8.168618 -0.903835 1 1.534187 -0.622492 -1 9.229518 2.066088 1 7.886242 0.191813 1 2.893743 -1.643468 -1 1.870457 -1.040420 -1 5.286862 -2.358286 1 6.080573 0.418886 1 2.544314 1.714165 -1 6.016004 -3.753712 1 0.926310 -0.564359 -1 0.870296 -0.109952 -1 2.369345 1.375695 -1 1.363782 -0.254082 -1 7.279460 -0.189572 1 1.896005 0.515080 -1 8.102154 -0.603875 1 2.529893 0.662657 -1 1.963874 -0.365233 -1 8.132048 0.785914 1 8.245938 0.372366 1 6.543888 0.433164 1 -0.236713 -5.766721 -1 8.112593 0.295839 1 9.803425 1.495167 1 1.497407 -0.552916 -1 1.336267 -1.632889 -1 9.205805 -0.586480 1 1.966279 -1.840439 -1 8.398012 1.584918 1 7.239953 -1.764292 1 7.556201 0.241185 1 9.015509 0.345019 1 8.266085 -0.230977 1 8.545620 2.788799 1 9.295969 1.346332 1 2.404234 0.570278 -1 2.037772 0.021919 -1 1.727631 -0.453143 -1 1.979395 -0.050773 -1 8.092288 -1.372433 1 1.667645 0.239204 -1 9.854303 1.365116 1 7.921057 -1.327587 1 8.500757 1.492372 1 1.339746 -0.291183 -1 3.107511 0.758367 -1 2.609525 0.902979 -1 3.263585 1.367898 -1 2.912122 -0.202359 -1 1.731786 0.589096 -1 2.387003 1.573131 -1原始測試數據 # -*- coding: utf-8 -*- """ Created on Tue Sep 4 16:58:16 2018 支持向量機代碼實現 SMO(Sequential Minimal Optimization)最小序列優化 @author: weixw """ import numpy as np #核轉換函數(一個特征空間映射到另一個特征空間,低維空間映射到高維空間) #高維空間解決線性問題,低維空間解決非線性問題 #線性內核 = 原始數據矩陣(100*2)與原始數據第一行矩陣轉秩乘積(2*1) =>(100*1) #非線性內核公式:k(x,y) = exp(-||x - y||**2/2*(e**2)) #1.原始數據每一行與原始數據第一行作差, #2.平方 def kernelTrans(dataMat, rowDataMat, kTup):m,n=np.shape(dataMat)#初始化核矩陣 m*1K = np.mat(np.zeros((m,1)))if kTup[0] == 'lin': #線性核K = dataMat*rowDataMat.Telif kTup[0] == 'rbf':#非線性核for j in range(m):#xi - xjdeltaRow = dataMat[j,:] - rowDataMatK[j] = deltaRow*deltaRow.T#1*m m*1 => 1*1K = np.exp(K/(-2*kTup[1]**2))else: raise NameError('Houston We Have a Problem -- That Kernel is not recognized')return K#定義數據結構體,用于緩存,提高運行速度 class optStruct:def __init__(self, dataSet, labelSet, C, toler, kTup):self.dataMat = np.mat(dataSet) #原始數據,轉換成m*n矩陣self.labelMat = np.mat(labelSet).T #標簽數據 m*1矩陣self.C = C #懲罰參數,C越大,容忍噪聲度小,需要優化;反之,容忍噪聲度高,不需要優化;#所有的拉格朗日乘子都被限制在了以C為邊長的矩形里self.toler = toler #容忍度self.m = np.shape(self.dataMat)[0] #原始數據行長度self.alphas = np.mat(np.zeros((self.m,1))) # alpha系數,m*1矩陣self.b = 0 #偏置self.eCache = np.mat(np.zeros((self.m,2))) # 保存原始數據每行的預測值self.K = np.mat(np.zeros((self.m,self.m))) # 核轉換矩陣 m*mfor i in range(self.m):self.K[:,i] = kernelTrans(self.dataMat, self.dataMat[i,:], kTup)#計算原始數據第k項對應的預測誤差 1*m m*1 =>1*1 #oS:結構數據 #k: 原始數據行索引 def calEk(oS, k):#f(x) = w*x + b fXk = float(np.multiply(oS.alphas,oS.labelMat).T*oS.K[:,k] + oS.b)Ek = fXk - float(oS.labelMat[k])return Ek#在alpha有改變都要更新緩存 def updateEk(oS, k):Ek = calEk(oS, k)oS.eCache[k] = [1, Ek]#第一次通過selectJrand()隨機選取j,之后選取與i對應預測誤差最大的j(步長最大) def selectJ(i, oS, Ei):#初始化maxK = -1 #誤差最大時對應索引maxDeltaE = 0 #最大誤差Ej = 0 # j索引對應預測誤差#保存每一行的預測誤差值 1相對于初始化為0的更改oS.eCache[i] = [1,Ei]#獲取數據緩存結構中非0的索引列表(先將矩陣第0列轉化為數組)validEcacheList = np.nonzero(oS.eCache[:,0].A)[0]#遍歷索引列表,尋找最大誤差對應索引if len(validEcacheList) > 1:for k in validEcacheList:if k == i:continueEk = calEk(oS, k)deltaE = abs(Ei - Ek)if(deltaE > maxDeltaE):maxK = kmaxDeltaE = deltaEEj = Ekreturn maxK, Ejelse:#隨機選取一個不等于i的jj = selectJrand(i, oS.m)Ej = calEk(oS, j)return j,Ej#隨機選取一個不等于i的索引 def selectJrand(i, m):j = iwhile (j == i):j = int(np.random.uniform(0, m))return j#alpha范圍剪輯 def clipAlpha(aj, L, H):if aj > H:aj = Hif aj < L:aj = Lreturn aj#從文件獲取特征數據,標簽數據 def loadDataSet(fileName):dataSet = []; labelSet = []fr = open(fileName)for line in fr.readlines():#分割lineArr = line.strip().split('\t')dataSet.append([float(lineArr[0]), float(lineArr[1])])labelSet.append(float(lineArr[2]))return dataSet, labelSet#計算 w 權重系數 def calWs(alphas, dataSet, labelSet):dataMat = np.mat(dataSet)#1*100 => 100*1labelMat = np.mat(labelSet).Tm, n = np.shape(dataMat) w = np.zeros((n, 1)) for i in range(m):w += np.multiply(alphas[i]*labelMat[i], dataMat[i,:].T) return w #計算原始數據每一行alpha,b,保存到數據結構中,有變化及時更新 def innerL(i, oS):#計算預測誤差Ei = calEk(oS, i)#選擇第一個alpha,違背KKT條件2#正間隔,負間隔if ((oS.labelMat[i] * Ei < -oS.toler) and (oS.alphas[i] < oS.C)) or ((oS.labelMat[i] * Ei > oS.toler) and (oS.alphas[i] > 0)):#第一次隨機選取不等于i的數據項,其后根據誤差最大選取數據項j, Ej = selectJ(i, oS, Ei)#初始化,開辟新的內存alphaIold = oS.alphas[i].copy()alphaJold = oS.alphas[j].copy()#通過 a1y1 + a2y2 = 常量# 0 <= a1,a2 <= C 求出L,Hif oS.labelMat[i] != oS.labelMat[j]:L = max(0, oS.alphas[j] - oS.alphas[i])H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])else:L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)H = min(oS.C, oS.alphas[j] + oS.alphas[i])if L == H : print ("L == H")return 0#內核分母 K11 + K22 - 2K12eta = oS.K[i, i] + oS.K[j, j] - 2.0*oS.K[i, j]if eta <= 0:print ("eta <= 0")return 0#計算第一個alpha joS.alphas[j] += oS.labelMat[j]*(Ei - Ej)/eta#修正alpha j的范圍oS.alphas[j] = clipAlpha(oS.alphas[j], L, H)#alpha有改變,就需要更新緩存數據updateEk(oS, j)#如果優化后的alpha 與之前的alpha變化很小,則舍棄,并重新選擇數據項的alphaif (abs(oS.alphas[j] - alphaJold) < 0.00001):print ("j not moving enough, abandon it.")return 0#計算alpha對的另一個alpha i# ai_new*yi + aj_new*yj = 常量# ai_old*yi + ai_old*yj = 常量 # 作差=> ai = ai_old + yi*yj*(aj_old - aj_new)oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j])#alpha有改變,就需要更新緩存數據updateEk(oS, i)#計算b1,b2# y(x) = w*x + b => b = y(x) - w*x# w = aiyixi(i= 1->N求和)#b1_new = y1_new - (a1_new*y1*k11 + a2_new*y2*k21 + ai*yi*ki1(i = 3 ->N求和 常量))#b1_old = y1_old - (a1_old*y1*k11 + a2_old*y2*k21 + ai*yi*ki1(i = 3 ->N求和 常量))#作差=> b1_new = b1_old + (y1_new - y1_old) - y1*k11*(a1_new - a1_old) - y2*k21*(a2_new - a2_old)# => b1_new = b1_old + Ei - yi*(ai_new - ai_old)*kii - yj*(aj_new - aj_old)*kij #同樣可推得 b2_new = b2_old + Ej - yi*(ai_new - ai_old)*kij - yj*(aj_new - aj_old)*kjjbi = oS.b - Ei - oS.labelMat[i]*(oS.alphas[i] - alphaIold)*oS.K[i,i] - oS.labelMat[j]*(oS.alphas[j] - alphaJold)*oS.K[i,j]bj = oS.b - Ej - oS.labelMat[i]*(oS.alphas[i] - alphaIold)*oS.K[i,j] - oS.labelMat[j]*(oS.alphas[j] - alphaJold)*oS.K[j,j]#首選alpha i,相對alpha j 更準確if (0 < oS.alphas[i]) and (oS.alphas[i] < oS.C):oS.b = bielif (0 < oS.alphas[j]) and (oS.alphas[j] < oS.C):oS.b = bjelse:oS.b = (bi + bj)/2.0return 1else:return 0#完整SMO核心算法,包含線性核核非線性核,返回alpha,b #dataSet 原始特征數據 #labelSet 標簽數據 #C 凸二次規劃參數 #toler 容忍度 #maxInter 循環次數 #kTup 指定核方式 #程序邏輯: #第一次全部遍歷,遍歷后根據alpha對是否有修改判斷, #如果alpha對沒有修改,外循環終止;如果alpha對有修改,則繼續遍歷屬于支持向量的數據。 #直至外循環次數達到maxIter #相比簡單SMO算法,運行速度更快,原因是: #1.不是每一次都全量遍歷原始數據,第一次遍歷原始數據, #如果alpha有優化,就遍歷支持向量數據,直至alpha沒有優化,然后再轉全量遍歷,這是如果alpha沒有優化,循環結束; #2.外循環不需要達到maxInter次數就終止; def smoP(dataSet, labelSet, C, toler, maxInter, kTup = ('lin', 0)):#初始化結構體類,獲取實例oS = optStruct(dataSet, labelSet, C, toler, kTup)iter = 0#全量遍歷標志entireSet = True#alpha對是否優化標志alphaPairsChanged = 0#外循環 終止條件:1.達到最大次數 或者 2.alpha對沒有優化while (iter < maxInter) and ((alphaPairsChanged > 0) or (entireSet)):alphaPairsChanged = 0#全量遍歷 ,遍歷每一行數據 alpha對有修改,alphaPairsChanged累加if entireSet:for i in range(oS.m):alphaPairsChanged += innerL(i, oS)print ("fullSet, iter: %d i:%d, pairs changed %d" %(iter, i, alphaPairsChanged))iter += 1else:#獲取(0,C)范圍內數據索引列表,也就是只遍歷屬于支持向量的數據nonBounds = np.nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]for i in nonBounds:alphaPairsChanged += innerL(i, oS)print ("non-bound, iter: %d i:%d, pairs changed %d" %(iter, i, alphaPairsChanged))iter += 1#全量遍歷->支持向量遍歷if entireSet:entireSet = False#支持向量遍歷->全量遍歷elif alphaPairsChanged == 0:entireSet = Trueprint ("iteation number: %d"% iter)print ("entireSet :%s"% entireSet)print ("alphaPairsChanged :%d"% alphaPairsChanged)return oS.b,oS.alphas#繪制支持向量 def drawDataMap(dataArr,labelArr,b,alphas):import matplotlib.pyplot as plt#alphas.A>0 獲取大于0的索引列表,只有>0的alpha才對分類起作用svInd=np.nonzero(alphas.A>0)[0] #分類數據點classified_pts = {'+1':[],'-1':[]}for point,label in zip(dataArr,labelArr):if label == 1.0:classified_pts['+1'].append(point)else:classified_pts['-1'].append(point)fig = plt.figure()ax = fig.add_subplot(111)#繪制數據點for label,pts in classified_pts.items():pts = np.array(pts)ax.scatter(pts[:, 0], pts[:, 1], label = label)#繪制分割線w = calWs(alphas, dataArr, labelArr)#函數形式:max( x ,key=lambda a : b ) # x可以是任何數值,可以有多個x值#先把x值帶入lambda函數轉換成b值,然后再將b值進行比較x1, _=max(dataArr, key=lambda x:x[0])x2, _=min(dataArr, key=lambda x:x[0]) a1, a2 = wy1, y2 = (-b - a1*x1)/a2, (-b - a1*x2)/a2#矩陣轉化為數組.Aax.plot([x1, x2],[y1.A[0][0], y2.A[0][0]])#繪制支持向量for i in svInd:x, y= dataArr[i] ax.scatter([x], [y], s=150, c ='none', alpha=0.7, linewidth=1.5, edgecolor = '#AB3319')plt.show()#alpha>0對應的數據才是支持向量,過濾不是支持向量的數據sVs= np.mat(dataArr)[svInd] #get matrix of only support vectorsprint ("there are %d Support Vectors.\n" % np.shape(sVs)[0])#訓練結果 def getTrainingDataResult(dataSet, labelSet, b, alphas, k1=1.3):datMat = np.mat(dataSet)#100*1labelMat = np.mat(labelSet).T#alphas.A>0 獲取大于0的索引列表,只有>0的alpha才對分類起作用svInd=np.nonzero(alphas.A>0)[0]sVs=datMat[svInd]labelSV = labelMat[svInd];m,n = np.shape(datMat)errorCount = 0for i in range(m):kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1))# y(x) = w*x + b => b = y(x) - w*x# w = aiyixi(i= 1->N求和)predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + bif np.sign(predict)!=np.sign(labelSet[i]): errorCount += 1print ("the training error rate is: %f" % (float(errorCount)/m))def getTestDataResult(dataSet, labelSet, b, alphas, k1=1.3):datMat = np.mat(dataSet)#100*1labelMat = np.mat(labelSet).T#alphas.A>0 獲取大于0的索引列表,只有>0的alpha才對分類起作用svInd=np.nonzero(alphas.A>0)[0]sVs=datMat[svInd]labelSV = labelMat[svInd];m,n = np.shape(datMat)errorCount = 0for i in range(m):kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1))# y(x) = w*x + b => b = y(x) - w*x# w = aiyixi(i= 1->N求和)predict=kernelEval.T * np.multiply(labelSV,alphas[svInd]) + bif np.sign(predict)!=np.sign(labelSet[i]): errorCount += 1 print ("the test error rate is: %f" % (float(errorCount)/m)) SMO算法實現 # -*- coding: utf-8 -*- """ Created on Wed Sep 5 15:22:26 2018@author: weixw """import mySVMMLiA as sm#通過訓練數據計算 b, alphas dataArr,labelArr = sm.loadDataSet('trainingData.txt') b, alphas = sm.smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', 0.10)) sm.drawDataMap(dataArr,labelArr,b,alphas) sm.getTrainingDataResult(dataArr, labelArr, b, alphas, 0.10) dataArr1,labelArr1 = sm.loadDataSet('testData.txt') #測試結果 sm.getTestDataResult(dataArr1, labelArr1, b, alphas, 0.10)測試代碼?
總結
以上是生活随笔為你收集整理的支持向量机python代码实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux打开文件int open,Li
- 下一篇: html旋转代码_付费?是不可能的!20