日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

用python分析小说_用Python对哈利波特系列小说进行情感分析

發布時間:2024/7/19 python 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 用python分析小说_用Python对哈利波特系列小说进行情感分析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原標題:用Python對哈利波特系列小說進行情感分析

準備數據

現有的數據是一部小說放在一個txt里,我們想按照章節(列表中第一個就是章節1的內容,列表中第二個是章節2的內容)進行分析,這就需要用到正則表達式整理數據。

比如我們先看看 01-Harry Potter and the Sorcerer's Stone.txt" 里的章節情況,我們打開txt

經過檢索發現,所有章節存在規律性表達

[Chapter][空格][整數][換行符n][可能含有空格的英文標題][換行符n]

我們先熟悉下正則,使用這個設計一個模板pattern提取章節信息

import re

import nltk

raw_text = open("data/01-Harry Potter and the Sorcerer's Stone.txt").read

pattern = 'Chapter d+n[a-zA-Z ]+n'

re.findall(pattern, raw_text)

['Chapter 1nThe Boy Who Livedn',

'Chapter 2nThe Vanishing Glassn',

'Chapter 3nThe Letters From No Onen',

'Chapter 4nThe Keeper Of The Keysn',

'Chapter 5nDiagon Alleyn',

'Chapter 7nThe Sorting Hatn',

'Chapter 8nThe Potions Mastern',

'Chapter 9nThe Midnight Dueln',

'Chapter 10nHalloweenn',

'Chapter 11nQuidditchn',

'Chapter 12nThe Mirror Of Erisedn',

'Chapter 13nNicholas Flameln',

'Chapter 14nNorbert the Norwegian Ridgebackn',

'Chapter 15nThe Forbidden Forestn',

'Chapter 16nThrough the Trapdoorn',

'Chapter 17nThe Man With Two Facesn']

熟悉上面的正則表達式操作,我們想更精準一些。我準備了一個test文本,與實際小說中章節目錄表達相似,只不過文本更短,更利于理解。按照我們的預期,我們數據中只有5個章節,那么列表的長度應該是5。這樣操作后的列表中第一個內容就是章節1的內容,列表中第二個內容是章節2的內容。

import re

test = """Chapter 1nThe Boy Who LivednMr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,

Chapter 2nThe Vanishing GlassnFor a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.

Chapter 3nThe Letters From No OnenThe traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.

Chapter 4nThe Keeper Of The KeysnHe didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin.

Chapter 5nDiagon AlleynIt was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. """

#獲取章節內容列表(列表中第一個內容就是章節1的內容,列表中第二個內容是章節2的內容)

#為防止列表中有空內容,這里加了一個條件判斷,保證列表長度與章節數預期一致

chapter_contents = [c for c in re.split('Chapter d+n[a-zA-Z ]+n', test) if c]

chapter_contents

['Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,n ',

'For a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.n ',

'The traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.n ',

'He didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin. n ',

'It was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. ']

能得到哈利波特的章節內容列表

也就意味著我們可以做真正的文本分析了

數據分析章節數對比

import os

import re

import matplotlib.pyplot as plt

colors = ['#78C850', '#A8A878','#F08030','#C03028','#6890F0', '#A890F0','#A040A0']

harry_potters = ["Harry Potter and the Sorcerer's Stone.txt",

"Harry Potter and the Chamber of Secrets.txt",

"Harry Potter and the Prisoner of Azkaban.txt",

"Harry Potter and the Goblet of Fire.txt",

"Harry Potter and the Order of the Phoenix.txt",

"Harry Potter and the Half-Blood Prince.txt",

"Harry Potter and the Deathly Hallows.txt"]

#橫坐標為小說名

harry_potter_names = [n.replace('Harry Potter and the ', '')[:-4]

for n in harry_potters]

#縱坐標為章節數

chapter_nums = []

for harry_potter in harry_potters:

file = "data/"+harry_potter

raw_text = open(file).read

pattern = 'Chapter d+n[a-zA-Z ]+n'

chapter_contents = [c for c in re.split(pattern, raw_text) if c]

chapter_nums.append(len(chapter_contents))

#設置畫布尺寸

plt.figure(figsize=(20, 10))

#圖的名字,字體大小,粗體

plt.title('Chapter Number of Harry Potter', fontsize=25, weight='bold')

#繪制帶色條形圖

plt.bar(harry_potter_names, chapter_nums, color=colors)

#橫坐標刻度上的字體大小及傾斜角度

plt.xticks(rotation=25, fontsize=16, weight='bold')

plt.yticks(fontsize=16, weight='bold')

#坐標軸名字

plt.xlabel('Harry Potter Series', fontsize=20, weight='bold')

plt.ylabel('Chapter Number', rotation=25, fontsize=20, weight='bold')

plt.show

從上面可以看出哈利波特系列小說的后四部章節數據較多(這分析沒啥大用處,主要是練習)

用詞豐富程度

如果說一句100個詞的句子,同時詞語不帶重樣的,那么用詞的豐富程度為100。

而如果說同樣長度的句子,只用到20個詞語,那么用詞的豐富程度為100/20=5。

import os

import re

import matplotlib.pyplot as plt

from nltk import word_tokenize

from nltk.stem.snowball importSnowballStemmer

plt.style.use('fivethirtyeight')

colors = ['#78C850', '#A8A878','#F08030','#C03028','#6890F0', '#A890F0','#A040A0']

harry_potters = ["Harry Potter and the Sorcerer's Stone.txt",

"Harry Potter and the Chamber of Secrets.txt",

"Harry Potter and the Prisoner of Azkaban.txt",

"Harry Potter and the Goblet of Fire.txt",

"Harry Potter and the Order of the Phoenix.txt",

"Harry Potter and the Half-Blood Prince.txt",

"Harry Potter and the Deathly Hallows.txt"]

#橫坐標為小說名

harry_potter_names = [n.replace('Harry Potter and the ', '')[:-4]

for n in harry_potters]

#用詞豐富程度

richness_of_words = []

stemmer = SnowballStemmer("english")

for harry_potter in harry_potters:

file = "data/"+harry_potter

raw_text = open(file).read

words = word_tokenize(raw_text)

words = [stemmer.stem(w.lower) for w in words]

wordset = set(words)

richness = len(words)/len(wordset)

richness_of_words.append(richness)

#設置畫布尺寸

plt.figure(figsize=(20, 10))

#圖的名字,字體大小,粗體

plt.title('The Richness of Word in Harry Potter', fontsize=25, weight='bold')

#繪制帶色條形圖

plt.bar(harry_potter_names, richness_of_words, color=colors)

#橫坐標刻度上的字體大小及傾斜角度

plt.xticks(rotation=25, fontsize=16, weight='bold')

plt.yticks(fontsize=16, weight='bold')

#坐標軸名字

plt.xlabel('Harry Potter Series', fontsize=20, weight='bold')

plt.ylabel('Richness of Words', rotation=25, fontsize=20, weight='bold')

plt.show

情感分析

哈利波特系列小說情緒發展趨勢,這里使用VADER,有現成的庫vaderSentiment,這里使用其中的polarity_scores函數,可以得到

neg:負面得分

neu:中性得分

pos:積極得分

compound: 綜合情感得分

from vaderSentiment.vaderSentiment importSentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer

test = 'i am so sorry'

analyzer.polarity_scores(test)

{'neg': 0.443, 'neu': 0.557, 'pos': 0.0, 'compound': -0.1513}

import os

import re

import matplotlib.pyplot as plt

from nltk.tokenize import sent_tokenize

from vaderSentiment.vaderSentiment importSentimentIntensityAnalyzer

harry_potters = ["Harry Potter and the Sorcerer's Stone.txt",

"Harry Potter and the Chamber of Secrets.txt",

"Harry Potter and the Prisoner of Azkaban.txt",

"Harry Potter and the Goblet of Fire.txt",

"Harry Potter and the Order of the Phoenix.txt",

"Harry Potter and the Half-Blood Prince.txt",

"Harry Potter and the Deathly Hallows.txt"]

#橫坐標為章節序列

chapter_indexes = []

#縱坐標為章節情緒得分

compounds = []

analyzer = SentimentIntensityAnalyzer

chapter_index = 1

for harry_potter in harry_potters:

file = "data/"+harry_potter

raw_text = open(file).read

pattern = 'Chapter d+n[a-zA-Z ]+n'

chapters = [c for c in re.split(pattern, raw_text) if c]

#計算每個章節的情感得分

for chapter in chapters:

compound = 0

sentences = sent_tokenize(chapter)

for sentence in sentences:

score = analyzer.polarity_scores(sentence)

compound += score['compound']

compounds.append(compound/len(sentences))

chapter_indexes.append(chapter_index)

chapter_index+=1

#設置畫布尺寸

plt.figure(figsize=(20, 10))

#圖的名字,字體大小,粗體

plt.title('Average Sentiment of the Harry Potter', fontsize=25, weight='bold')

#繪制折線圖

plt.plot(chapter_indexes, compounds, color='#A040A0')

#橫坐標刻度上的字體大小及傾斜角度

plt.xticks(rotation=25, fontsize=16, weight='bold')

plt.yticks(fontsize=16, weight='bold')

#坐標軸名字

plt.xlabel('Chapter', fontsize=20, weight='bold')

plt.ylabel('Average Sentiment', rotation=25, fontsize=20, weight='bold')

plt.show

曲線不夠平滑,為了熨平曲線波動,自定義了一個函數

import numpy as np

import os

import re

import matplotlib.pyplot as plt

from nltk.tokenize import sent_tokenize

from vaderSentiment.vaderSentiment importSentimentIntensityAnalyzer

#曲線平滑函數

def movingaverage(value_series, window_size):

window = np.ones(int(window_size))/float(window_size)

return np.convolve(value_series, window, 'same')

harry_potters = ["Harry Potter and the Sorcerer's Stone.txt",

"Harry Potter and the Chamber of Secrets.txt",

"Harry Potter and the Prisoner of Azkaban.txt",

"Harry Potter and the Goblet of Fire.txt",

"Harry Potter and the Order of the Phoenix.txt",

"Harry Potter and the Half-Blood Prince.txt",

"Harry Potter and the Deathly Hallows.txt"]

#橫坐標為章節序列

chapter_indexes = []

#縱坐標為章節情緒得分

compounds = []

analyzer = SentimentIntensityAnalyzer

chapter_index = 1

for harry_potter in harry_potters:

file = "data/"+harry_potter

raw_text = open(file).read

pattern = 'Chapter d+n[a-zA-Z ]+n'

chapters = [c for c in re.split(pattern, raw_text) if c]

#計算每個章節的情感得分

for chapter in chapters:

compound = 0

sentences = sent_tokenize(chapter)

for sentence in sentences:

score = analyzer.polarity_scores(sentence)

compound += score['compound']

compounds.append(compound/len(sentences))

chapter_indexes.append(chapter_index)

chapter_index+=1

#設置畫布尺寸

plt.figure(figsize=(20, 10))

#圖的名字,字體大小,粗體

plt.title('Average Sentiment of the Harry Potter',

fontsize=25,

weight='bold')

#繪制折線圖

plt.plot(chapter_indexes, compounds,

color='red')

plt.plot(movingaverage(compounds, 10),

color='black',

linestyle=':')

#橫坐標刻度上的字體大小及傾斜角度

plt.xticks(rotation=25,

fontsize=16,

weight='bold')

plt.yticks(fontsize=16,

weight='bold')

#坐標軸名字

plt.xlabel('Chapter',

fontsize=20,

weight='bold')

plt.ylabel('Average Sentiment',

rotation=25,

fontsize=20,

weight='bold')

plt.show

全新打卡學習模式

每天30分鐘

30天學會Python編程

世界正在獎勵堅持學習的人!返回搜狐,查看更多

責任編輯:

總結

以上是生活随笔為你收集整理的用python分析小说_用Python对哈利波特系列小说进行情感分析的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 懂色av蜜臀av粉嫩av分享 | 色屋永久 | 国产又黄又大又粗视频 | 光棍影院手机版在线观看免费 | 九九99视频 | 日本激情网址 | 91精品国产91久久久久久黑人 | 亚洲字幕成人中文在线观看 | 香蕉久久国产 | 亚洲熟妇中文字幕五十中出 | 亚洲国产成人一区二区精品区 | 亚洲精品aⅴ | 性xxx法国hd极品 | 日韩一级一区 | 91入囗 | 精品日本一区二区 | 岛国片在线免费观看 | 国产5区| 亚洲天堂一区二区在线 | 亚洲综合精品在线 | 靠逼动漫| 少妇人妻综合久久中文字幕 | 麻豆精品一区二区三区 | 中文字幕乱码亚洲无线三区 | 台湾久久| 极品国产白皙 | 网站在线免费观看 | 日韩av无码中文字幕 | 国产精品白丝喷水在线观看 | 欧美色图综合网 | 99久久精品免费看国产 | 一本一道精品欧美中文字幕 | 亚洲欧美一区二区三区在线 | 亚洲女人被黑人巨大进入 | 日韩精品1 | 国产精品腿扒开做爽爽爽挤奶网站 | 欧亚免费视频 | 日韩成人av网站 | 人操人 | 韩国三级国产 | 多男调教一女折磨高潮高h 国内毛片毛片毛片毛片毛片 | 成年人高清视频 | 亚洲四虎影院 | 视频久久 | 男人操女人下面视频 | 国产精品99久久久 | 日韩一卡二卡三卡四卡 | 日本黄色xxxxx | 丰满少妇被猛烈进入 | 国产99色 | 一级a性色生活片久久毛片 爱爱高潮视频 | h亚洲| 国产情侣一区二区 | 黄色av大全 | 久久国产精品久久久久久电车 | 自拍偷拍第八页 | 午夜在线精品 | 国产 欧美 日韩 一区 | 爱豆国产剧免费观看大全剧集 | 欧美日韩精品一区二区三区蜜桃 | 国产精品久久色 | 91精品国产综合久久久蜜臀九色 | 大尺度床戏视频 | 国产女教师一区二区三区 | 欧美第一网站 | 午夜网| 久久精品99久久久 | www.com在线观看 | 成年人免费观看网站 | 国产在线观看一区二区三区 | 成人免费福利 | 神秘电影永久入口 | 国产主播99 | 69视频一区二区 | 一级一片免费看 | 天天色成人网 | 久久精品视频网 | 国产精品久久久一区二区 | 九九热精品视频在线观看 | 青草久久久 | 最近日本中文字幕 | 91久久综合精品国产丝袜蜜芽 | 99免费在线观看 | 2022精品国偷自产免费观看 | 欧洲xxxxx | 精品97人妻无码中文永久在线 | 喷水了…太爽了高h | 亚洲欧美精品在线观看 | 黄色aa级片 | 一级特黄aa大片 | 国产精品久久久久久人妻精品动漫 | 中文字幕在线一区 | 日日夜夜操操 | 日本激情视频 | 国产麻豆成人精品av | 欧洲成人在线视频 | 三级av网址 | 2025国产精品 | 久久久久久综合网 |