當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

MATLAB中调用Weka设置方法（转）及示例

發布時間：2023/12/13 编程问答 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 MATLAB中调用Weka设置方法（转）及示例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文轉自：
http://blog.sina.com.cn/s/blog_890c6aa30101av9x.html

MATLAB命令行下驗證Java版本命令

version -java

配置MATLAB調用Java庫

Finish Java codes.

Create Java library file, i.e., .jar file.

Put created .jar file to one of directories Matlab uses for storing libraries, and add corresponding path to
Matlab configuration file, $MATLABINSTALLDIR\$MatlabVersion\toolbox\local\classpath.txt.

配置MATLAB調用Weka

下載weka

安裝weka

在環境變量的系統變量中的Path中加入jre6(或者其他的)中bin文件夾的絕對路徑，如：
C:\Program Files\Java\jre1.8.0_77\bin;

查找MATLAB配置文件classpath.txt
which classpath.txt %使用這個命令可以查找classpath.txt的位置

修改配置文件classpath.txt
edit classpath.txt
在classpath.txt配置文件中將weka安裝目錄下的weka.jar的絕對安裝路徑填入，如：
C:\Program Files\Weka-3-8\weka.jar

重啟MATLAB

運行如下命令：
attributes = javaObject(‘weka.core.FastVector’);
%如果MATLAB沒有報錯，就說明配置成功了

Matlab在調用weka中的類時，經常遇見heap space溢出的情況，我們需要設置較大的堆棧，設置方法是：
Matlab->File->Preference->General->Java Heap Memory, 然后設置適當的值。

Matlab調用Weka示例
代碼來自：
http://cn.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
http://www.mathworks.com/matlabcentral/fileexchange/37315-rusboost

clc; clear all; close all;file = 'data.csv'; % Dataset% Reading training file data = dlmread(file); label = data(:,end);% Extracting positive data points idx = (label==1); pos_data = data(idx,:); row_pos = size(pos_data,1);% Extracting negative data points neg_data = data(~idx,:); row_neg = size(neg_data,1);% Random permuation of positive and negative data points p = randperm(row_pos); n = randperm(row_neg);% 80-20 split for training and test tstpf = p(1:round(row_pos/5)); tstnf = n(1:round(row_neg/5)); trpf = setdiff(p, tstpf); trnf = setdiff(n, tstnf);train_data = [pos_data(trpf,:);neg_data(trnf,:)]; test_data = [pos_data(tstpf,:);neg_data(tstnf,:)];% Decision Tree prediction = SMOTEBoost(train_data,test_data,'tree',false); disp (' Label Probability'); disp ('-----------------------------'); disp (prediction); function prediction = SMOTEBoost (TRAIN,TEST,WeakLearn,ClassDist) % This function implements the SMOTEBoost Algorithm. For more details on the % theoretical description of the algorithm please refer to the following % paper: % N.V. Chawla, A.Lazarevic, L.O. Hall, K. Bowyer, "SMOTEBoost: Improving % Prediction of Minority Class in Boosting, Journal of Knowledge Discovery % in Databases: PKDD, 2003. % Input: TRAIN = Training data as matrix % TEST = Test data as matrix % WeakLearn = String to choose algortihm. Choices are % 'svm','tree','knn' and 'logistic'. % ClassDist = true or false. true indicates that the class % distribution is maintained while doing weighted % resampling and before SMOTE is called at each % iteration. false indicates that the class distribution % is not maintained while resampling. % Output: prediction = size(TEST,1)x 2 matrix. Col 1 is class labels for % all instances. Col 2 is probability of the instances % being classified as positive class.javaaddpath('weka.jar');%% Training SMOTEBoost % Total number of instances in the training set m = size(TRAIN,1); POS_DATA = TRAIN(TRAIN(:,end)==1,:); NEG_DATA = TRAIN(TRAIN(:,end)==0,:); pos_size = size(POS_DATA,1); neg_size = size(NEG_DATA,1);% Reorganize TRAIN by putting all the positive and negative exampels % together, respectively. TRAIN = [POS_DATA;NEG_DATA];% Converting training set into Weka compatible format CSVtoARFF (TRAIN, 'train', 'train'); train_reader = javaObject('java.io.FileReader', 'train.arff'); train = javaObject('weka.core.Instances', train_reader); train.setClassIndex(train.numAttributes() - 1);% Total number of iterations of the boosting method T = 10;% W stores the weights of the instances in each row for every iteration of % boosting. Weights for all the instances are initialized by 1/m for the % first iteration. W = zeros(1,m); for i = 1:mW(1,i) = 1/m; end% L stores pseudo loss values, H stores hypothesis, B stores (1/beta) % values that is used as the weight of the % hypothesis while forming the % final hypothesis. % All of the following are of length <=T and stores % values for every iteration of the boosting process. L = []; H = {}; B = [];% Loop counter t = 1;% Keeps counts of the number of times the same boosting iteration have been % repeated count = 0;% Boosting T iterations while t <= T% LOG MESSAGEdisp (['Boosting iteration #' int2str(t)]);if ClassDist == true% Resampling POS_DATA with weights of positive examplePOS_WT = zeros(1,pos_size);sum_POS_WT = sum(W(t,1:pos_size));for i = 1:pos_sizePOS_WT(i) = W(t,i)/sum_POS_WT ;endRESAM_POS = POS_DATA(randsample(1:pos_size,pos_size,true,POS_WT),:);% Resampling NEG_DATA with weights of positive exampleNEG_WT = zeros(1,neg_size);sum_NEG_WT = sum(W(t,pos_size+1:m));for i = 1:neg_sizeNEG_WT(i) = W(t,pos_size+i)/sum_NEG_WT ;endRESAM_NEG = NEG_DATA(randsample(1:neg_size,neg_size,true,NEG_WT),:);% Resampled TRAIN is stored in RESAMPLEDRESAMPLED = [RESAM_POS;RESAM_NEG];% Calulating the percentage of boosting the positive class. 'pert'% is used as a parameter of SMOTEpert = ((neg_size-pos_size)/pos_size)*100;else % Indices of resampled trainRND_IDX = randsample(1:m,m,true,W(t,:));% Resampled TRAIN is stored in RESAMPLEDRESAMPLED = TRAIN(RND_IDX,:);% Calulating the percentage of boosting the positive class. 'pert'% is used as a parameter of SMOTEpos_size = sum(RESAMPLED(:,end)==1);neg_size = sum(RESAMPLED(:,end)==0);pert = ((neg_size-pos_size)/pos_size)*100;end% Converting resample training set into Weka compatible formatCSVtoARFF (RESAMPLED,'resampled','resampled');reader = javaObject('java.io.FileReader','resampled.arff');resampled = javaObject('weka.core.Instances',reader);resampled.setClassIndex(resampled.numAttributes()-1);% New SMOTE boosted data gets stored in Ssmote = javaObject('weka.filters.supervised.instance.SMOTE');pert = ((neg_size-pos_size)/pos_size)*100;smote.setPercentage(pert);smote.setInputFormat(resampled);S = weka.filters.Filter.useFilter(resampled, smote);% Training a weak learner. 'pred' is the weak hypothesis. However, the % hypothesis function is encoded in 'model'.switch WeakLearncase 'svm'model = javaObject('weka.classifiers.functions.SMO');case 'tree'model = javaObject('weka.classifiers.trees.J48');case 'knn'model = javaObject('weka.classifiers.lazy.IBk');model.setKNN(5);case 'logistic'model = javaObject('weka.classifiers.functions.Logistic');endmodel.buildClassifier(S);pred = zeros(m,1);for i = 0 : m - 1pred(i+1) = model.classifyInstance(train.instance(i));end% Computing the pseudo loss of hypothesis 'model'loss = 0;for i = 1:mif TRAIN(i,end)==pred(i)continue;elseloss = loss + W(t,i);endend% If count exceeds a pre-defined threshold (5 in the current% implementation), the loop is broken and rolled back to the state% where loss > 0.5 was not encountered.if count > 5L = L(1:t-1);H = H(1:t-1);B = B(1:t-1);disp (' Too many iterations have loss > 0.5');disp (' Aborting boosting...');break;end% If the loss is greater than 1/2, it means that an inverted% hypothesis would perform better. In such cases, do not take that% hypothesis into consideration and repeat the same iteration. 'count'% keeps counts of the number of times the same boosting iteration have% been repeatedif loss > 0.5count = count + 1;continue;elsecount = 1;end L(t) = loss; % Pseudo-loss at each iterationH{t} = model; % Hypothesis function beta = loss/(1-loss); % Setting weight update parameter 'beta'.B(t) = log(1/beta); % Weight of the hypothesis% At the final iteration there is no need to update the weights any% furtherif t==Tbreak;end% Updating weight for i = 1:mif TRAIN(i,end)==pred(i)W(t+1,i) = W(t,i)*beta;elseW(t+1,i) = W(t,i);endend% Normalizing the weight for the next iterationsum_W = sum(W(t+1,:));for i = 1:mW(t+1,i) = W(t+1,i)/sum_W;end% Incrementing loop countert = t + 1; end% The final hypothesis is calculated and tested on the test set % simulteneously.%% Testing SMOTEBoost n = size(TEST,1); % Total number of instances in the test setCSVtoARFF(TEST,'test','test'); test = 'test.arff'; test_reader = javaObject('java.io.FileReader', test); test = javaObject('weka.core.Instances', test_reader); test.setClassIndex(test.numAttributes() - 1);% Normalizing B sum_B = sum(B); for i = 1:size(B,2)B(i) = B(i)/sum_B; endprediction = zeros(n,2);for i = 1:n% Calculating the total weight of the class labels from all the models% produced during boostingwt_zero = 0;wt_one = 0;for j = 1:size(H,2)p = H{j}.classifyInstance(test.instance(i-1)); if p==1wt_one = wt_one + B(j);else wt_zero = wt_zero + B(j); endendif (wt_one > wt_zero)prediction(i,:) = [1 wt_one];elseprediction(i,:) = [0 wt_one];end end function r = CSVtoARFF (data, relation, type) % csv to arff file converter% load the csv data [rows cols] = size(data);% open the arff file for writing farff = fopen(strcat(type,'.arff'), 'w');% print the relation part of the header fprintf(farff, '@relation %s', relation);% Reading from the ARFF header fid = fopen('ARFFheader.txt','r'); tline = fgets(fid); while ischar(tline)tline = fgets(fid);fprintf(farff,'%s',tline); end fclose(fid);% Converting the data for i = 1 : rows% print the attribute values for the data pointfor j = 1 : cols - 1if data(i,j) ~= -1 % check if it is a missing valuefprintf(farff, '%d,', data(i,j));elsefprintf(farff, '?,');endend% print the label for the data pointfprintf(farff, '%d\n', data(i,end)); end% close the file fclose(farff);r = 0; function model = ClassifierTrain(data,type) % Training the classifier that would do the sample selectionjavaaddpath('weka.jar');CSVtoARFF(data,'train','train'); train_file = 'train.arff'; reader = javaObject('java.io.FileReader', train_file); train = javaObject('weka.core.Instances', reader); train.setClassIndex(train.numAttributes() - 1); % options = javaObject('java.lang.String');switch typecase 'svm'model = javaObject('weka.classifiers.functions.SMO');kernel = javaObject('weka.classifiers.functions.supportVector.RBFKernel');model.setKernel(kernel);case 'tree'model = javaObject('weka.classifiers.trees.J48');% options = weka.core.Utils.splitOptions('-C 0.2');% model.setOptions(options);case 'knn'model = javaObject('weka.classifiers.lazy.IBk');model.setKNN(5);case 'logistic'model = javaObject('weka.classifiers.functions.Logistic'); endmodel.buildClassifier(train); function prediction = ClassifierPredict(data,model) % Predicting the labels of the test instances % Input: data = test data % model = the trained model % type = type of classifier % Output: prediction = prediction labelsjavaaddpath('weka.jar');CSVtoARFF(data,'test','test'); test_file = 'test.arff'; reader = javaObject('java.io.FileReader', test_file); test = javaObject('weka.core.Instances', reader); test.setClassIndex(test.numAttributes() - 1);prediction = []; for i = 0 : size(data,1) - 1p = model.classifyInstance(test.instance(i));prediction = [prediction; p]; end

總結

以上是生活随笔為你收集整理的MATLAB中调用Weka设置方法（转）及示例的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： [转载]MIDAS/Gen常见问题汇编（
下一篇：分类算法中的ROC与PR指标