0

Example:

I have a sentence 'Face book is a social networking company', which I want to clean by concatenating 'Face' and 'book' into 'Facebook'. I would like to check and perform this for numerous sentences. Any suggestions on how can I do this?

I thought of something on the lines of this: first tokenzing the sentence and then looping over every word and check if the token (word) after 'face' is 'book' and then delete the two elements and all 'Facebook'.

1
  • 4
    sentence.replace("Face book", "Facebook) Commented Jun 15, 2018 at 22:18

3 Answers 3

1

Wouldn't a simple regex based approach be sufficient?

>>> import re
>>> s='Face book is a social networking company'
>>> re.sub(r'[Ff]ace [Bb]ook', 'Facebook', s)
'Facebook is a social networking company'
Sign up to request clarification or add additional context in comments.

2 Comments

you might want to make your regex a bit more generalized to allow it to be truly more advantageous to a substring approach. eg: allow it to process Face book, face book and Face Book . :)
If you're just using fixed strings like this, there's no reason to use regex; just string operations.
1

The most straight forward way of doing this in python, for me, would be using a tuple. Just pack all your strings into a tuple and loop through while applying the str.replace(old,new) method. str.replace(old,new) replaces a substring in the string str, with a new substring you specify. Example below:

Code:

string1 = "Face book is a social networking company1"
string2 = "Face book is a social networking company2"
string3 = "Face book is a social networking company3"
old = "Face book"
new = "Facebook"

superdupletuple = (string1, string2,string3)

for i in superdupletuple:
    print(i.replace(old, new))

Output:

Facebook is a social networking company1
Facebook is a social networking company2
Facebook is a social networking company3

Comments

-1

In Python, this might look something like this: (Keep in mind this is only a rough idea, it won’t be perfect in all cases)

——————————

string = “I use Face book”
tokenized = string.split(“ “)
for i in range(0,len(tokenized)-1):
    if tokenized[i].lower()==“face” and tokenized[i+1].lower()==“book”:
        del tokenized[i+1]
        tokenized[i] = “Facebook”
    if i > len(tokenized):
        break

———————————

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.