0

I am learning python for my job to be able to manipulate statistical data. I already have a knowledge of C# and javascript and can solve this issue using these languages however I'm having difficulty translating the solution to python.

THE ISSUE Count all unique four letter words in a .txt file. Any word with an apostrophe in should be ignored. Ignore the case of the word (i.e. Tool and tool should only be counted as one word). Print out (so that the user can see) the number of unique four letter words.

Divide up the four letter words based upon the last two letters of the word (the word ending). Count up how many words you have for each of these endings.

Print out a list of word endings and the number of words you found for each ending.

I have solved this issue in Javascript below:

var listOfWords = ['card','alma','soon','bard','moon','dare'];
var groupings = {};

for(var i = 0; i < listOfWords.length; i++);
{
    var ending = listOfWords[i].substring(2,4)
    if(groupings[ending] === undefined)
    {
        groupings[ending] = {}
        groupings[ending].words = []
        groupings[ending].count = 0
    }
    groupings[ending].words.push(listOfWords[i])
    groupings[ending].count++
};

console.debug(groupings);

Here is what I have so far in python:

import re
text = open("words.txt")
regex = re.compile(r'\b\w{4}\b')
allFours = []
groupings = []

for line in text:
    four_letter_words = regex.findall(line)
    for word in four_letter_words:        
        allFours.append(word)

mylist = list(dict.fromkeys(allFours))
uniqueWordCount = len(mylist)
print(uniqueWordCount)
for i = 0; i < mylist.length; i++:
    var ending = mylist[i]

I hope I have explained everything clearly any questions just ask. All help is greatly appreciated, thank you.

3
  • 1
    Other than that's not valid Python (Python does not have a var keyword; its for loop syntax is different) what actually is your question? Commented Nov 21, 2019 at 10:20
  • Well there are number of questions here: How can I select a sing item from a list and then do the python equivalent of .substring so you see I have "ending = mylist[i]" how do I then substring the selected item. Commented Nov 21, 2019 at 10:22
  • I need to do this *** var ending = listOfWords[i].substring(2,4) *** in python Commented Nov 21, 2019 at 10:24

1 Answer 1

2

THE ISSUE Count all unique four letter words in a .txt file. Any word with an apostrophe in should be ignored. Ignore the case of the word (i.e. Tool and tool should only be counted as one word). Print out (so that the user can see) the number of unique four letter words.

Divide up the four letter words based upon the last two letters of the word (the word ending). Count up how many words you have for each of these endings.

  • unique -> set
  • 4-letter -> better just check the length than use regex, regexes are slow
  • ignore words with apostrophes -> "'" not in word
  • ignore case -> convert all to lower, easy
  • divide the set based on last 2 letters -> make a dict
result = set()
with open("words.txt") as fd:
    for line in fd:
        matching_words = {word for word in line.lower().split() if len(word)==4 and "'" not in word}
        result.update(matching_words)
print(result)
print(len(result))

line.lower() makes the whole line lower letter, then .split() with default arguments splits it on whitespace.

result_dict = {}
for word in result:
    # better to use default dict here but you'll need to read docs for that
    result_dict[word[2:]] = result_dict.get(word[2:], []) + [word]
print(result_dict)
print({key: len(value) for key, value in result_dict.items()})
Sign up to request clarification or add additional context in comments.

1 Comment

Exactly what I was looking for thank you for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.