I am working on an application that requires me to extract keywords (and finally generate a tag cloud of these words) from a stream of conversations. I am considering the following steps:
- Tokenize each raw conversation (output stored as List of List of strings)
- Remove stop words
- Use stemmer (Porter stemming algorithm)
Up till here, nltk provides all the tools I need.After this, however I need to somehow "rank" these words and come up with most important words. Can anyone suggest me what tools from nltk might be used for this ?
Thanks Nihit