日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Dataset:fetch_20newsgroups(20类新闻文本)数据集的简介、安装、使用方法之详细攻略

發(fā)布時間:2025/3/21 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Dataset:fetch_20newsgroups(20类新闻文本)数据集的简介、安装、使用方法之详细攻略 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Dataset:fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的簡介、安裝、使用方法之詳細(xì)攻略

?

?

?

目錄

fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的簡介

1、數(shù)據(jù)集信息

2、數(shù)據(jù)集標(biāo)簽20類別

3、數(shù)據(jù)集前三篇文章

fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的安裝

fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的使用方法


?

?


fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的簡介

? ? ? ? 20 newsgroups數(shù)據(jù)集18000多篇新聞文章,一共涉及到20種話題,所以稱作20newsgroups text dataset,分為兩部分:訓(xùn)練集和測試集,通常用來做文本分類,均勻分為20個不同主題的新聞組集合。20newsgroups數(shù)據(jù)集是被用于文本分類、文本挖據(jù)和信息檢索研究的國際標(biāo)準(zhǔn)數(shù)據(jù)集之一。一些新聞組的主題特別相似(e.g. comp.sys.ibm.pc.hardware/ comp.sys.mac.hardware),還有一些卻完全不相關(guān) (e.g misc.forsale /soc.religion.christian)。

?

1、數(shù)據(jù)集信息

數(shù)據(jù)集形狀 (18846,)

? ? ================= ? ==========
? ? Classes ? ? ? ? ? ? ? ? ? ? 20
? ? Samples total ? ? ? ? ? ?18846
? ? Dimensionality ? ? ? ? ? ? ? 1
? ? Features ? ? ? ? ? ? ? ? ?text
? ? ================= ? ==========

?

?

2、數(shù)據(jù)集標(biāo)簽20類別

?['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']

?


3、數(shù)據(jù)集前三篇文章

?["From: Mamatha Devineni Ratnam <mr47+@andrew.cmu.edu>\nSubject: Pens fans reactions\nOrganization: Post Office, Carnegie Mellon, Pittsburgh, PA\nLines: 12\nNNTP-Posting-Host: po4.andrew.cmu.edu\n\n\n\nI am sure some bashers of Pens fans are pretty confused about the lack\nof any kind of posts about the recent Pens massacre of the Devils. Actually,\nI am ?bit puzzled too and a bit relieved. However, I am going to put an end\nto non-PIttsburghers' relief with a bit of praise for the Pens. Man, they\nare killing those Devils worse than I thought. Jagr just showed you why\nhe is much better than his regular season stats. He is also a lot\nfo fun to watch in the playoffs. Bowman should let JAgr have a lot of\nfun in the next couple of games since the Pens are going to beat the pulp out of Jersey anyway. I was very disappointed not to see the Islanders lose the final\nregular season game. ? ? ? ? ?PENS RULE!!!\n\n", 'From: mblawson@midway.ecn.uoknor.edu (Matthew B Lawson)\nSubject: Which high-performance VLB video card?\nSummary: Seek recommendations for VLB video card\nNntp-Posting-Host: midway.ecn.uoknor.edu\nOrganization: Engineering Computer Network, University of Oklahoma, Norman, OK, USA\nKeywords: orchid, stealth, vlb\nLines: 21\n\n ?My brother is in the market for a high-performance video card that supports\nVESA local bus with 1-2MB RAM. ?Does anyone have suggestions/ideas on:\n\n ?- Diamond Stealth Pro Local Bus\n\n ?- Orchid Farenheit 1280\n\n ?- ATI Graphics Ultra Pro\n\n ?- Any other high-performance VLB card\n\n\nPlease post or email. ?Thank you!\n\n ?- Matt\n\n-- \n ? ?| ?Matthew B. Lawson <------------> (mblawson@essex.ecn.uoknor.edu) ?| ? \n ?--+-- "Now I, Nebuchadnezzar, praise and exalt and glorify the King ?--+-- \n ? ?| ? of heaven, because everything he does is right and all his ways ?| ? \n ? ?| ? are just." - Nebuchadnezzar, king of Babylon, 562 B.C. ? ? ? ? ? | ? \n']

?

?

fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的安裝

fetch_20newsgroups(data_home=None, # 文件下載的路徑subset='train', # 加載那一部分?jǐn)?shù)據(jù)集 train/testcategories=None, # 選取哪一類數(shù)據(jù)集[類別列表],默認(rèn)20類shuffle=True, ?# 將數(shù)據(jù)集隨機(jī)排序random_state=42, # 隨機(jī)數(shù)生成器remove=(), # ('headers','footers','quotes') 去除部分文本download_if_missing=True # 如果沒有下載過,重新下載)news = fetch_20newsgroups(subset='all')

?


fetch_20newsgroups(20類新聞文本)數(shù)據(jù)集的使用方法

ML之LoR:利用pipeline對fetch_20newsgroups數(shù)據(jù)集(文本抽取TfidfVectorizer)采用SVC算法(GSCV)實現(xiàn)多分類
ML之NB:利用樸素貝葉斯NB算法(CountVectorizer+不去除停用詞)對fetch_20newsgroups數(shù)據(jù)集(20類新聞文本)進(jìn)行分類預(yù)測、評估


?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

《新程序員》:云原生和全面數(shù)字化實踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀

總結(jié)

以上是生活随笔為你收集整理的Dataset:fetch_20newsgroups(20类新闻文本)数据集的简介、安装、使用方法之详细攻略的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。