0

hope someone could help me. I am new to python and just learning. I would like to know how to delete unwanted characters from a string.

For example,

I have some strings in a text file such as 'dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3'

I don't need anything after that starts with op. i.e., what I would like to get is just 'dogs, cats, pig'

I see 'op' as the pattern in all these strings and therefore tried the below code

import re
f = open('animalsop.txt','r')
s = f.read()
p = re.compile('op')
match = p.search(s)
print (s[:match.start()])

The output I get is just 'dog'

why do I not get the cat and pig as well since they contain 'op' too.

Any help would be greatly appreciated because I would the code to analyse a huge similar data I have got.

The above code was derived from String splitting in Python using regex

credits to Varuna and kragniz

1
  • I'd suggest using dr jimbob's answer since some other answers here might break depending on input. For example, if you have a sentence that says dog opportunities, some answers here may break. dr jimbob's looks for spaces on either side. If you do use regex, you should use \bop\b, which ensures that what precedes/followed op is a non-word character (not a-zA-Z0-9_), or ` op ` which does pretty much what dr jimbob's answer does but in regex Commented Oct 3, 2017 at 15:12

4 Answers 4

2

It's probably easiest to not use regular expressions to solve your problem.

Assuming a file named animalsop.txt that looks like:

dogs op care 6A domain
cats op pv=2 domain 3
pig op care2 domain 3

A pythonic solution to your problem would be something like:

with open('animalsop.txt', 'r') as f:
    for line in f:
        before_op = line.split(' op ')[0]
        print(before_op)

The nice thing about the with construct for opening files in python is that it ensures that you close the file when you are done.

If instead, your animalsop.txt file is just one long line with various clauses separated by commas like:

dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3

Then you could do something like:

with open('animalsop.txt', 'r') as f:
    for line in f:
        for clause in line.split(','):
            before_op = clause.strip().split(' op')[0]
            print(before_op)

(The clause.strip() removes whitespace if it's present after the comma).

Sign up to request clarification or add additional context in comments.

9 Comments

Hi drjimbob, many thanks for the code. I did try that but the output looks like
dog op, cat op, pig op
any suggestions how I could have just have dog, cat, pig without the 'op'. Many thanks
I am sorry if I have confused, but when I try the code, it returns only 'dog' and not the cat and pig. Am I doing anything wrong here please
@Tikku - are you sure? If I have a file that consists of three lines: dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3 inside a file called animalsop.txt, and you paste the code snippet above, you'll get dogs, cats, and pig on three separate lines.
|
1

Based on examples you have provided I suggest to use simple .split() string method and select first part - e.g. part before " op".

partOfYourInterest = "dogs op care 6A domain".split(" op")[0]

for more you can iterate e.g. via for loop

text = ["dogs op care 6A domain","cats op pv=2 domain 3", "pig op care2 domain 3"]

for part in text:
    animal = part.split(" op")[0]
    print(animal)

And for your txt it could be like

with open('animalsop.txt', 'r') as f:
    for line in f:
       animal = part.split(" op")[0]
       print(animal)

3 Comments

Good solution @Petr Matuska
Many thanks Petr Matuska. I tried the code and I got exactly what I wanted however, I am wondering how to get the strings such as this within quotes. It was easier to type text = ["dogs op care 6A domain","cats op pv=2 domain 3", "pig op care2 domain 3"], but could suggest how I could put this in a huge text file. Many thanks
Yes, you can open and read you txt file and process it line by line - I edit my code.
0

If you want to use a regular expression you can use:

re.findall('\w+?(?= op)', s)

['dogs', 'cats', 'pig']

2 Comments

thanks Evan for you kind code. its easier when I can pick up dogs, cats and pig however when I use large data sets I wondering how to pick them.
The regex will work with any dataset, it just looks for the word before "op"
0

if you only want the first word, you can use if string is your string

string='dog fgfdggf fgs, cat afgfg, pig fggag'
strings=string.split(', ')
newstring=strings[0].split(' ', 1)[0]
for stri in strings[1:]:
    newstring=newstring+', '+stri.split(' ', 1)[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.