1

I have a string that I want to split into a list of certain types. For example, I want to split Starter Main Course Dessert to [Starter, Main Course, Dessert]

I cannot use split() because it will split up the Main Course type. How can I do the splitting? Is regex needed?

5
  • You would have to know either the words or partial words, or the layout in order to do this.. Commented Feb 12, 2017 at 17:25
  • What matches Main Course but not Starter Main or Course Dessert (from Starter Main Course Dessert)? This is impossible, AFAIK. Commented Feb 12, 2017 at 17:26
  • Yes I know the words that I want to split into, but I am not sure how to do it from the original string Commented Feb 12, 2017 at 17:30
  • Maybe what you need requires 2-gram(bigram). In Python you can use nltk. This may be helpful. And this and this too. Commented Feb 12, 2017 at 17:31
  • So you know the all the certain words that you want to keep together, right? Commented Feb 12, 2017 at 17:48

2 Answers 2

3

If you have a list of acceptable words, you could use a regex union :

import re

acceptable_words = ['Starter', 'Main Course', 'Dessert', 'Coffee', 'Aperitif']
pattern = re.compile("("+"|".join(acceptable_words)+")", re.IGNORECASE)
# "(Starter|Main Course|Dessert|Coffee|Aperitif)"

menu = "Starter Main Course NotInTheList dessert"
print pattern.findall(menu)
# ['Starter', 'Main Course', 'dessert']

If you just want to specify which special substrings should be matched, you could use :

acceptable_words = ['Main Course', '\w+']
Sign up to request clarification or add additional context in comments.

Comments

0

I think it's more practical to specify 'special' two-words tokens only.

special_words = ['Main Course', 'Something Special']
sentence = 'Starter Main Course Dessert Something Special Date'

words = sentence.split(' ')
for i in range(len(words) - 1):
    try:
        idx = special_words.index(str(words[i]) + ' ' + words[i+1])
        words[i] = special_words[idx]
        words[i+1] = None
    except ValueError:
        pass

words = list(filter(lambda x: x is not None, words))
print(words)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.