Setting n-grams for sentiment analysis with Python and TextBlob

Question

I want to do sentiment analysis of some sentences with Python and TextBlob lib. I know how to use that, but Is there any way to set n-grams to that? Basically, I do not want to analyze word by word, but I want to analyze 2 words, 3 words, because phrases can carry much more meaning and sentiment.

For example, this is what I have done (it works):

from textblob import TextBlob

my_string = "This product is very good, you should try it"

my_string = TextBlob(my_string)

sentiment = my_string.sentiment.polarity
subjectivity = my_string.sentiment.subjectivity

print(sentiment)
print(subjectivity)

But how can I apply, for example n-grams = 2, n-grams = 3 etc? Is it possible to do that with TextBlob, or VaderSentiment lib?

what do you want to set? mystring.ngrams(n=3) will give you the 3grams — jeremy_rutman
– jeremy_rutman, Commented Dec 1, 2019 at 12:03
Basically, I do not want to analyze sentiment 1 word by 1 word, but I want to analyze sentiment 2 words, 3 words etc — taga
– taga, Commented Dec 1, 2019 at 12:06
you could make use of the spacy's noun-chunking feature, that forms more valuable phrases with less noise compared to n-gram method. — Haridas N
– Haridas N, Commented Dec 3, 2019 at 10:48
Can you show me how to do that? Or better, to show me how to do that with n-grams and with spacy. — taga
– taga, Commented Dec 3, 2019 at 10:51

Brent Rohner · Accepted Answer · 2019-12-05 14:30:50Z

Here is a solution that finds n-grams without using any libraries.

from textblob import TextBlob

def find_ngrams(n, input_sequence):
    # Split sentence into tokens.
    tokens = input_sequence.split()
    ngrams = []
    for i in range(len(tokens) - n + 1):
        # Take n consecutive tokens in array.
        ngram = tokens[i:i+n]
        # Concatenate array items into string.
        ngram = ' '.join(ngram)
        ngrams.append(ngram)

    return ngrams

if __name__ == '__main__':
    my_string = "This product is very good, you should try it"

    ngrams = find_ngrams(3, my_string)
    analysis = {}
    for ngram in ngrams:
        blob = TextBlob(ngram)
        print('Ngram: {}'.format(ngram))
        print('Polarity: {}'.format(blob.sentiment.polarity))
        print('Subjectivity: {}'.format(blob.sentiment.subjectivity))

To change the ngram lengths, change the n value in the function find_ngrams().

Schnipp · Accepted Answer · 2019-12-08 07:32:35Z

There is no parameter within textblob to define n-grams as opposed to words/unigrams to be used as features for sentiment analysis.

Textblob uses a polarity lexicon to calculate the overall sentiment of a text. This lexicon contains unigrams, which means it can only give you the sentiment of a word but not a n-gram with n>1.

I guess you could work around that by feeding bi- or tri-grams into the sentiment classifier, just like you would feed in a sentence and then create a dictionary of your n-grams with their accumulated sentiment value. But I'm not sure that this is a good idea. I'm assuming you are looking for bigrams to address problems like negation ("not bad") and the lexicon approach won't be able to use not for flipping the sentiment value for bad.

Textblob also contains an option to use a naiveBayes classifier instead of the lexicon approach. This is trained on a movie review corpus provided by nltk but the default features for training are words/unigrams as far as I can make out from peeking at the source code. You might be able to implement your own feature extractor within there to extract n-grams instead of words and then re-train it accordingly and use for your data.

Regardless of all that, I would suggest that you use a combination of unigrams and n>1-grams as features, because dropping unigrams entirely is likely to affect your performance negatively. Bigrams are much more sparsely distributed, so you'll struggle with data sparsity problems when training.

Forcetti · Accepted Answer · 2023-12-28 23:45:16Z

As Jeremy pointed out in his comment, you could use TextBlobs built-in ngram() function:

from textblob import TextBlob

def get_ngrams(text, min_n=1, max_n=3):
    n_gram_dict = {}
    for n in range(min_n, max_n+1):
        n_gram_dict[f"{n}-grams"] = [" ".join(x) for x in TextBlob(text).ngrams(n)]
    return n_gram_dict


text = "This product is very good, you should try it."
sentiment_threshold = 0.7
n_grams = get_ngrams(text, 2, 3)
for key in n_grams:
    print(f"{key}:")
    [print(f"- {n_gram}") for n_gram in n_grams[k] if TextBlob(n_gram).sentiment.polarity >= sentiment_threshold]

This example will simply print the bigrams and trigrams that have a sentiment polarity value equal or higher than a certain threshold.

Collectives™ on Stack Overflow

Setting n-grams for sentiment analysis with Python and TextBlob

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related