日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

weka使用训练集分类测试集_科学网—使用独立测试集对分类模型进行评估 - 李向东的博文...

發(fā)布時間:2025/3/15 编程问答 18 豆豆
生活随笔 收集整理的這篇文章主要介紹了 weka使用训练集分类测试集_科学网—使用独立测试集对分类模型进行评估 - 李向东的博文... 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

這兩天還是糾結(jié)于分類模型的準(zhǔn)確率。因為對從網(wǎng)上隨機(jī)摘錄的文本進(jìn)行分類時,結(jié)果總是不甚理想,不像使用cross-validation得到的結(jié)果那么好。

于是決定使用獨立測試集(含1402個實例)進(jìn)行評估。訓(xùn)練集實例9804個,特征9302個,沒有使用特征選擇。準(zhǔn)確率大約78%,其中“歷史”和“藝術(shù)”有點分不清。結(jié)果如下:

-------------------------------------------------------------------------

weka.filters.unsupervised.attribute.StringToWordVector in:9804

Number of instances: 9804

Number of attributes: 9302

loading test data in:test_segmented......

weka.filters.unsupervised.attribute.StringToWordVector in:1402

weka.filters.unsupervised.attribute.ReplaceMissingValues in:9804

weka.filters.unsupervised.attribute.Normalize in:9804

evaluating.........

=== Detailed Accuracy By Class ===

TP Rate?? FP Rate?? Precision?? Recall? F-Measure?? ROC Area? Class

0.91????? 0.008????? 0.901???? 0.91????? 0.905????? 0.993??? C11-Space

0.455???? 0.001????? 0.938???? 0.455???? 0.612????? 0.928??? C15-Energy

0.464???? 0????????? 1???????? 0.464???? 0.634????? 0.974??? C16-Electronics

0.556???? 0.001????? 0.938???? 0.556???? 0.698????? 0.989??? C17-Communication

0.98????? 0.031????? 0.705???? 0.98????? 0.82?????? 0.985??? C19-Computer

0.588???? 0.003????? 0.833???? 0.588???? 0.69?????? 0.96???? C23-Mine

0.78????? 0.001????? 0.979???? 0.78????? 0.868????? 0.996??? C29-Transport

0.81????? 0.035????? 0.638???? 0.81????? 0.714????? 0.974??? C3-Art

0.95????? 0.006????? 0.922???? 0.95????? 0.936????? 0.994??? C31-Enviornment

0.92????? 0.009????? 0.885???? 0.92????? 0.902????? 0.99???? C32-Agriculture

0.96????? 0.034????? 0.686???? 0.96????? 0.8??????? 0.979??? C34-Economy

0.692???? 0.004????? 0.878???? 0.692???? 0.774????? 0.989??? C35-Law

0.472???? 0????????? 1???????? 0.472???? 0.641????? 0.98???? C36-Medical

0.526???? 0.002????? 0.952???? 0.526???? 0.678????? 0.992??? C37-Military

0.91????? 0.048????? 0.591???? 0.91????? 0.717????? 0.965??? C38-Politics

0.97????? 0.021????? 0.782???? 0.97????? 0.866????? 0.989??? C39-Sports

0.235???? 0????????? 1???????? 0.235???? 0.381????? 0.852??? C4-Literature

0.639???? 0.004????? 0.886???? 0.639???? 0.743????? 0.974??? C5-Education

0.489???? 0.002????? 0.88????? 0.489???? 0.629????? 0.891??? C6-Philosophy

0.75????? 0.026????? 0.688???? 0.75????? 0.718????? 0.963??? C7-History

Correctly Classified Instances??????? 1095?????????????? 78.1027 %

Incorrectly Classified Instances?????? 307?????????????? 21.8973 %

Kappa statistic????????????????????????? 0.7661

Mean absolute error????????????????????? 0.0904

Root mean squared error????????????????? 0.2092

Relative absolute error???????????????? 97.1367 %

Root relative squared error???????????? 94.8845 %

Total Number of Instances???????????? 1402

=== Confusion Matrix ===

a? b? c? d? e? f? g? h? i? j? k? l? m? n? o? p? q? r? s? t??

91? 0? 0? 0? 9? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0 |? a = C11-Space

0 15? 0? 0? 4? 4? 0? 0? 2? 1? 3? 0? 0? 0? 2? 2? 0? 0? 0? 0 |? b = C15-Energy

0? 0 13? 1? 9? 0? 0? 0? 0? 0? 2? 0? 0? 0? 0? 3? 0? 0? 0? 0 |? c = C16-Electronics

1? 0? 0 15? 7? 0? 0? 0? 0? 0? 1? 0? 0? 1? 1? 1? 0? 0? 0? 0 |? d = C17-Communication

2? 0? 0? 0 98? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0 |? e = C19-Computer

0? 0? 0? 0? 7 20? 0? 0? 2? 0? 2? 0? 0? 0? 2? 1? 0? 0? 0? 0 |? f = C23-Mine

0? 1? 0? 0? 1? 0 46? 0? 0? 0? 5? 2? 0? 0? 3? 1? 0? 0? 0? 0 |? g = C29-Transport

0? 0? 0? 0? 0? 0? 0 81? 0? 0? 1? 0? 0? 0? 0? 0? 0? 0? 0 18 |? h = C3-Art

0? 0? 0? 0? 1? 0? 0? 0 95? 4? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0 |? i = C31-Enviornment

0? 0? 0? 0? 0? 0? 0? 0? 0 92? 7? 0? 0? 0? 0? 0? 0? 0? 0? 1 |? j = C32-Agriculture

0? 0? 0? 0? 0? 0? 0? 0? 0? 1 96? 0? 0? 0? 2? 0? 0? 0? 0? 1 |? k = C34-Economy

0? 0? 0? 0? 0? 0? 1? 0? 0? 1? 5 36? 0? 1? 8? 0? 0? 0? 0? 0 |? l = C35-Law

0? 0? 0? 0? 0? 0? 0? 2? 0? 4? 8? 1 25? 0? 7? 4? 0? 2? 0? 0 |? m = C36-Medical

4? 0? 0? 0? 0? 0? 0? 0? 1? 0? 1? 1? 0 40 24? 3? 0? 1? 0? 1 |? n = C37-Military

0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 3? 0? 0? 0 91? 0? 0? 0? 0? 6 |? o = C38-Politics

0? 0? 0? 0? 0? 0? 0? 1? 1? 0? 0? 0? 0? 0? 0 97? 0? 0? 0? 1 |? p = C39-Sports

0? 0? 0? 0? 1? 0? 0 13? 0? 0? 1? 0? 0? 0? 3? 2? 8? 0? 2? 4 |? q = C4-Literature

0? 0? 0? 0? 0? 0? 0? 3? 1? 1? 1? 1? 0? 0? 6? 9? 0 39? 0? 0 |? r = C5-Education

3? 0? 0? 0? 2? 0? 0? 8? 1? 0? 1? 0? 0? 0? 4? 0? 0? 2 22? 2 |? s = C6-Philosophy

0? 0? 0? 0? 0? 0? 0 19? 0? 0? 3? 0? 0? 0? 1? 1? 0? 0? 1 75 |? t = C7-History

-------------------------------------------------------------------------

源文件主要代碼:

String traindatadir = "train_segmented";

TextDirectoryLoader loader = new TextDirectoryLoader();

loader.setDirectory(new File( traindatadir ));

Instances dataRaw = loader.getDataSet();

StringToWordVector filter = new StringToWordVector();

filter.setStemmer( new NullStemmer() );

filter.setInputFormat(dataRaw);

System.out.println("nnfiltering data in:" + traindatadir+ "......nn");

Instances dataFiltered = Filter.useFilter(dataRaw, filter);

System.out.println("Number of instances: "+ dataFiltered.numInstances());

System.out.println("Number of attributes: "+ dataFiltered.numAttributes());

String testdatadir = "test_segmented";

System.out.println("nnloading test data in:" + testdatadir+ "......nn");

loader.setDirectory(new File( testdatadir ));

Instances testRaw = loader.getDataSet();

//因為剛剛過濾了訓(xùn)練集,所以過濾器會使用訓(xùn)練集的結(jié)構(gòu)對testRaw進(jìn)行過濾

Instances testFiltered=Filter.useFilter(testRaw, filter);

SMO classifier = new SMO();

classifier.buildClassifier(dataFiltered);

System.out.println("evaluating.........");

Evaluation eval = new Evaluation(dataFiltered);

eval.evaluateModel(classifier, testFiltered); //使用獨立測試集進(jìn)行評估

System.out.println(eval.toClassDetailsString());

System.out.println(eval.toSummaryString());

System.out.println(eval.toMatrixString());

現(xiàn)在想知道的是,否能保存剛剛過濾了訓(xùn)練集的過濾器?以便下次對一個文本進(jìn)行過濾和分類?

轉(zhuǎn)載本文請聯(lián)系原作者獲取授權(quán),同時請注明本文來自李向東科學(xué)網(wǎng)博客。

鏈接地址:http://blog.sciencenet.cn/blog-713110-574111.html

上一篇:weka中使用TFIDF進(jìn)行特征選擇

下一篇:使用DataSource和DataSink

總結(jié)

以上是生活随笔為你收集整理的weka使用训练集分类测试集_科学网—使用独立测试集对分类模型进行评估 - 李向东的博文...的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。