當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

情感分析朴素贝叶斯_朴素贝叶斯推文的情感分析

發布時間：2023/12/15 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了情感分析朴素贝叶斯_朴素贝叶斯推文的情感分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

情感分析樸素貝葉斯

Millions of tweets are posted every second. It helps us know how the public is responding to a particular event. To get the sentiments of tweets, We can use the Naive Bayes classification algorithm, which is simply the application of Bayes rule.

每秒發布數百萬條推文。它可以幫助我們了解公眾如何響應特定事件。為了獲得推文的情感，我們可以使用樸素貝葉斯分類算法，這只是貝葉斯規則的應用。

貝葉斯規則 (Bayes Rule)

Bayes rule is merely describing the probability of an event on prior knowledge of the occurrence of another event related to it.

貝葉斯規則僅是根據與之相關的另一個事件的發生的先驗知識來描述事件的概率。

Then the probability of occurrence of event A given that event B has already occurred is

假設事件B已經發生，則事件A發生的概率為

And for the probability of occurrence of event B given that event A has already occurred is

對于事件B發生的概率，假設事件A已經發生，則為

Using both these equations, we can rewrite them collectively as

使用這兩個等式，我們可以將它們統一重寫為

Let’s take a look at tweets and how we are going to extract features from them

讓我們看一下推文以及我們如何從中提取功能

We will be having two corpora of tweets, positive and negative tweets.

我們將有兩種推文，正面和負面推文。

Positive tweets: ‘I am happy because I am learning NLP,’ ‘I am happy, not sad.’

積極的推文 ：“我很高興，因為我正在學習NLP”，“我很高興，而不是悲傷”。

Negative tweets: ‘I am sad, I am not learning NLP,’ ‘I am sad, not happy.’

負面推文 ：“我很難過，我沒有學習NLP”，“我很難過，不開心”。

前處理 (Preprocessing)

We need to preprocess our data so that we can save a lot of memory and reduce the computational process.

我們需要對數據進行預處理，以便節省大量內存并減少計算過程。

Lowercase: We will convert all the text to lower case. so, that the words like Learning and leaning can be taken as same words

小寫：我們將所有文本轉換為小寫。因此，可以將像“學習”和“學習”之類的詞視為同一詞

Removing punctuations, URLs, names: We will remove the punctuations URLs and names or hashtags because they don’t contribute to sentiment analysis of a tweet.

刪除標點符號，URL，名稱：我們將刪除標點符號URL，名稱或主題標簽，因為它們對推文的情緒分析沒有幫助。

Removing stopwords: The stopwords like ‘the’, ‘is’ don’t contribute in sentiment. Therefore these words have to be removed.

刪除停用詞：諸如“ the”，“ is”之類的停用詞不會助長情緒。因此，必須刪除這些單詞。

Stemming: The words like ‘took’, ‘taking’ are treated as the same words and are converted to there base words, here it is ‘take’. This saves a lot of memory and time.

詞干：“ took”，“ take”等詞被視為相同的詞，并轉換為那里的基本詞，此處為“ take”。這樣可以節省大量內存和時間。

概率方法： (Probabilistic approach:)

In order to get the probability stats for the words, we will be creating a dictionary of these words and counting the occurrence of each word in positive and negative tweets.

為了獲得單詞的概率統計信息，我們將創建這些單詞的字典，并以正面和負面推文計算每個單詞的出現次數。

Let’s see how these word counts are helpful in finding the probability of the word for both classes. Here the word ‘i’ occurred three times, and the total unique words in the positive corpus are 13. Therefore, the probability of occurrence of the word ‘i’ given that the tweet is positive will be

讓我們看看這些單詞計數對查找兩個類中單詞的概率有何幫助。在這里，單詞“ i”出現了3次，并且正語料庫中的唯一詞總數為13。因此，假設推文為肯定，則單詞“ i”出現的概率為

freq denotes the frequency of occurrence of a word, class: {pos, neg}freq表示單詞出現的頻率，類別：{pos，neg}

Doing this for all our words in our vocabulary, we will get a table like this:

對我們詞匯表中的所有單詞執行此操作，我們將獲得一個如下表：

In the Naive Bayes, We will find how each word is contributing to the sentiment, which can be calculated by the ratio of the probability of occurrence of the word for positive and negative class. Let’s take an example; We can see that the probability of occurrence of the word ‘sad’ is more for negative than positive class. So, we will find the ratio of these probabilities for every word by the formula:

在樸素貝葉斯中，我們將發現每個單詞如何對情感產生影響，可以通過正負兩類單詞出現的概率之比來計算。讓我們舉個例子。我們可以看到，“消極”一詞出現的可能性在消極類別中比在積極類別中更大。因此，我們將通過公式找到每個單詞的這些概率之比：

This ratio is known as the likelihood, and its value lies between (0, ∞). The value tending to zero indicates that it has very low probability to occur in a positive tweet as compared to the probability to occur in a negative tweet and the ratio value tending to infinity shows that it has very low probability to occur in a negative tweet as compared to the probability to occur in a positive tweet. In other words, the high value of ratio implies positivity. Also, the ratio value 1 means that the name is neutral.

該比率稱為似然，其值介于(0，∞)之間。趨于零的值表示與在負面推文中出現的可能性相比，在正推文中發生的可能性非常低，并且比率值趨于無窮大表示在負推文中發生的可能性非常低，例如與出現正面推文的可能性相比。換句話說，比率的高值表示陽性。此外，比率值1表示名稱是中性的。

拉普拉斯平滑 (Laplace Smoothing)

Some words might have occurred in any particular class only. The words which did not occur in the negative class will have probability 0 which makes the ratio undefined. So, we will use the Laplace smoothing technique to pursue this kind of situation. Let’s take on how equation changes on applying Laplace smoothing:

某些單詞可能僅在任何特定類中出現過。否定類中未出現的單詞的概率為0，這使得比率不確定。因此，我們將使用拉普拉斯平滑技術來解決這種情況。讓我們來看看在應用拉普拉斯平滑處理時方程如何變化：

By adding ‘1’ in the numerator makes the probability non zero. This factor is called alpha-factor and is between (0,1]; specifically, when we set this alpha-factor to 1, the smoothing is termed as Laplace smoothing. Also, the sum of probabilities will remain at 1.

通過在分子中加“ 1”，使概率非零。該因子稱為Alpha因子，介于(0,1]之間；具體來說，當我們將此Alpha因子設置為1時，平滑稱為Laplace平滑，而且概率之和將保持為1。

Here in our example, the number of unique words is eight gives us V= 8.

在我們的示例中，唯一字的數量為8，因此我們得到V = 8。

After Laplace smoothing the table of the probability will look like this:

在拉普拉斯平滑之后，概率表將如下所示：

樸素貝葉斯： (Naive Bayes:)

To estimate the sentiment of a tweet, we will take the product of the probability ratio of each word occurred in the tweet. Note, the words which are not present in our vocabulary will not contribute and will be taken as neutral. The equation for naive Bayes in our application will be like this:

要估算推文的情緒，我們將采用推文中每個單詞出現的概率比的乘積。請注意，我們詞匯表中不存在的單詞將不會有幫助，并將被視為中立的。在我們的應用程序中，樸素貝葉斯方程將如下所示：

m = number of words in a tweet, w = set of words in a tweetm =一條推文中的單詞數，w =一條推文中的單詞集

Since the data can be imbalanced and can cause biased results for a particular class, we multiply the above equation with a prior factor, which is the ratio of the probability of positive tweets to the probability of negative tweets.

由于數據可能不平衡，并且可能導致特定類別的結果有偏差，因此我們將上述方程式與先驗因子相乘，這是正面推文的概率與負面推文的概率之比。

complete equation of Naive Bayes樸素貝葉斯的完整方程

Since we are taking the product of all these ratios, we can end up with a number too large or too small to be stored on our device, so here comes the concept of log-likelihood. We take the log over our equation of Naive Bayes.

由于我們采用了所有這些比率的乘積，因此最終得出的數字太大或太小而無法存儲在設備中，因此出現了對數似然的概念。我們將日志記錄在樸素貝葉斯方程上。

After taking the log of the likelihood equation, the scale will be changed as follows:

取似然方程的對數后，比例將如下更改：

Let’s see an example. Tweet: ‘I am happy because I am learning.

讓我們來看一個例子。推文：“我很高興，因為我正在學習。

This is the overall log-likelihood for our tweet.這是我們的推文的總體對數可能性。

Hence, the value of the overall log-likelihood of the tweet is greater than zero, which implies that the tweet is positive.

因此，該推文的總體對數似然值大于零，這表示該推文為正。

缺點： (Drawbacks:)

Naive Bayes algorithm assumes that the words are independent of each other.

樸素貝葉斯算法假設單詞彼此獨立。

Relative frequencies in the corpus: Some times the people blocks particular type of tweets which might be offensive, etc. which leads to an imbalance of data

語料庫中的相對頻率：有時人們會阻止可能令人反感的特定類型的推文，這會導致數據不平衡

Word order: By changing the order of words, the sentiment might change, but with Naive Bayes, we can not encounter that.

單詞順序：通過更改單詞順序，可能會改變情緒，但是對于樸素貝葉斯，我們無法做到這一點。

Removal of punctuations: Remember in data preprocessing, we removed punctuations, which might change the sentiment of the tweet. Here is an example: ‘My beloved grandmother :( ’

刪除標點符號：請記住，在數據預處理中，我們刪除了標點符號，這可能會改變推文的情緒。這是一個例子：'我親愛的祖母:('

結論： (Conclusion:)

The Naive Bayes is a straightforward and powerful algorithm, knowing the data one can preprocess the data accordingly. Naive Bayes algorithm is also used in many aspects of society like spam classification, Loan approval, etc.

樸素貝葉斯算法是一種簡單而強大的算法，它知道數據可以相應地進行預處理。樸素貝葉斯算法還被用于社會的許多方面，例如垃圾郵件分類，貸款批準等。

翻譯自: https://towardsdatascience.com/sentiment-analysis-of-a-tweet-with-naive-bayes-ff9bdb2949c7

情感分析樸素貝葉斯

總結

以上是生活随笔為你收集整理的情感分析朴素贝叶斯_朴素贝叶斯推文的情感分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： java处理emoji表情
下一篇：梯度下降优化方法'原理_优化梯度下降的新