String and word manipulation in Python

Question

Example:

I have a sentence 'Face book is a social networking company', which I want to clean by concatenating 'Face' and 'book' into 'Facebook'. I would like to check and perform this for numerous sentences. Any suggestions on how can I do this?

I thought of something on the lines of this: first tokenzing the sentence and then looping over every word and check if the token (word) after 'face' is 'book' and then delete the two elements and all 'Facebook'.

sentence.replace("Face book", "Facebook)

Prune
– Prune

2018-06-15 22:18:33 +00:00
Commented Jun 15, 2018 at 22:18 — Prune
– Prune, Commented Jun 15, 2018 at 22:18

Sunitha · Accepted Answer · 2018-06-15 22:30:45Z

1

Wouldn't a simple regex based approach be sufficient?

>>> import re
>>> s='Face book is a social networking company'
>>> re.sub(r'[Ff]ace [Bb]ook', 'Facebook', s)
'Facebook is a social networking company'

edited Jun 15, 2018 at 22:30

answered Jun 15, 2018 at 22:15

Sunitha

12.1k2 gold badges23 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Srini Over a year ago

you might want to make your regex a bit more generalized to allow it to be truly more advantageous to a substring approach. eg: allow it to process Face book, face book and Face Book . :)

abarnert Over a year ago

If you're just using fixed strings like this, there's no reason to use regex; just string operations.

user9948596 · Accepted Answer · 2018-06-15 22:23:39Z

The most straight forward way of doing this in python, for me, would be using a tuple. Just pack all your strings into a tuple and loop through while applying the str.replace(old,new) method. str.replace(old,new) replaces a substring in the string str, with a new substring you specify. Example below:

Code:

string1 = "Face book is a social networking company1"
string2 = "Face book is a social networking company2"
string3 = "Face book is a social networking company3"
old = "Face book"
new = "Facebook"

superdupletuple = (string1, string2,string3)

for i in superdupletuple:
    print(i.replace(old, new))

Output:

Facebook is a social networking company1
Facebook is a social networking company2
Facebook is a social networking company3

Sphinx · Accepted Answer · 2018-06-15 23:38:48Z

-1

In Python, this might look something like this: (Keep in mind this is only a rough idea, it won’t be perfect in all cases)

——————————

string = “I use Face book”
tokenized = string.split(“ “)
for i in range(0,len(tokenized)-1):
    if tokenized[i].lower()==“face” and tokenized[i+1].lower()==“book”:
        del tokenized[i+1]
        tokenized[i] = “Facebook”
    if i > len(tokenized):
        break

———————————

edited Jun 15, 2018 at 23:38

Sphinx

10.7k2 gold badges35 silver badges50 bronze badges

answered Jun 15, 2018 at 22:20

Caleb H.

1,7151 gold badge12 silver badges32 bronze badges

Collectives™ on Stack Overflow

String and word manipulation in Python

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related