String deleting

Question

hope someone could help me. I am new to python and just learning. I would like to know how to delete unwanted characters from a string.

For example,

I have some strings in a text file such as 'dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3'

I don't need anything after that starts with op. i.e., what I would like to get is just 'dogs, cats, pig'

I see 'op' as the pattern in all these strings and therefore tried the below code

import re
f = open('animalsop.txt','r')
s = f.read()
p = re.compile('op')
match = p.search(s)
print (s[:match.start()])

The output I get is just 'dog'

why do I not get the cat and pig as well since they contain 'op' too.

Any help would be greatly appreciated because I would the code to analyse a huge similar data I have got.

The above code was derived from String splitting in Python using regex

credits to Varuna and kragniz

I'd suggest using dr jimbob's answer since some other answers here might break depending on input. For example, if you have a sentence that says dog opportunities, some answers here may break. dr jimbob's looks for spaces on either side. If you do use regex, you should use \bop\b, which ensures that what precedes/followed op is a non-word character (not a-zA-Z0-9_), or ` op ` which does pretty much what dr jimbob's answer does but in regex — ctwheels
– ctwheels, Commented Oct 3, 2017 at 15:12

dr jimbob · Accepted Answer · 2017-10-03 16:07:12Z

2

It's probably easiest to not use regular expressions to solve your problem.

Assuming a file named animalsop.txt that looks like:

dogs op care 6A domain
cats op pv=2 domain 3
pig op care2 domain 3

A pythonic solution to your problem would be something like:

with open('animalsop.txt', 'r') as f:
    for line in f:
        before_op = line.split(' op ')[0]
        print(before_op)

The nice thing about the with construct for opening files in python is that it ensures that you close the file when you are done.

If instead, your animalsop.txt file is just one long line with various clauses separated by commas like:

dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3

Then you could do something like:

with open('animalsop.txt', 'r') as f:
    for line in f:
        for clause in line.split(','):
            before_op = clause.strip().split(' op')[0]
            print(before_op)

(The clause.strip() removes whitespace if it's present after the comma).

edited Oct 3, 2017 at 16:07

answered Oct 3, 2017 at 14:37

dr jimbob

17.8k7 gold badges63 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Tikku Over a year ago

Hi drjimbob, many thanks for the code. I did try that but the output looks like

Tikku Over a year ago

dog op, cat op, pig op

Tikku Over a year ago

any suggestions how I could have just have dog, cat, pig without the 'op'. Many thanks

Tikku Over a year ago

I am sorry if I have confused, but when I try the code, it returns only 'dog' and not the cat and pig. Am I doing anything wrong here please

dr jimbob Over a year ago

@Tikku - are you sure? If I have a file that consists of three lines: dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3 inside a file called animalsop.txt, and you paste the code snippet above, you'll get dogs, cats, and pig on three separate lines.

|

Petr Matuska · Accepted Answer · 2017-10-04 07:53:32Z

1

Based on examples you have provided I suggest to use simple .split() string method and select first part - e.g. part before " op".

partOfYourInterest = "dogs op care 6A domain".split(" op")[0]

for more you can iterate e.g. via for loop

text = ["dogs op care 6A domain","cats op pv=2 domain 3", "pig op care2 domain 3"]

for part in text:
    animal = part.split(" op")[0]
    print(animal)

And for your txt it could be like

with open('animalsop.txt', 'r') as f:
    for line in f:
       animal = part.split(" op")[0]
       print(animal)

edited Oct 4, 2017 at 7:53

answered Oct 3, 2017 at 14:36

Petr Matuska

5735 silver badges16 bronze badges

3 Comments

Marvin Over a year ago

Good solution @Petr Matuska

Tikku Over a year ago

Many thanks Petr Matuska. I tried the code and I got exactly what I wanted however, I am wondering how to get the strings such as this within quotes. It was easier to type text = ["dogs op care 6A domain","cats op pv=2 domain 3", "pig op care2 domain 3"], but could suggest how I could put this in a huge text file. Many thanks

Petr Matuska Over a year ago

Yes, you can open and read you txt file and process it line by line - I edit my code.

Evan Nowak · Accepted Answer · 2017-10-03 14:38:18Z

0

If you want to use a regular expression you can use:

re.findall('\w+?(?= op)', s)

['dogs', 'cats', 'pig']

answered Oct 3, 2017 at 14:38

Evan Nowak

8954 silver badges8 bronze badges

2 Comments

Tikku Over a year ago

thanks Evan for you kind code. its easier when I can pick up dogs, cats and pig however when I use large data sets I wondering how to pick them.

Evan Nowak Over a year ago

The regex will work with any dataset, it just looks for the word before "op"

Ioannis Nasios · Accepted Answer · 2017-10-03 14:49:34Z

0

if you only want the first word, you can use if string is your string

string='dog fgfdggf fgs, cat afgfg, pig fggag'
strings=string.split(', ')
newstring=strings[0].split(' ', 1)[0]
for stri in strings[1:]:
    newstring=newstring+', '+stri.split(' ', 1)[0]

answered Oct 3, 2017 at 14:49

Ioannis Nasios

8,5474 gold badges41 silver badges59 bronze badges

Collectives™ on Stack Overflow

String deleting

4 Answers 4

9 Comments

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

9 Comments

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related