0

I am having a difficult time with these sets of inputs and outputs:

input: so sh [/] she had a [^ wheee] .
output: so sh [/] she had a .

input: aah [!] [^ makes sound effects] .
output: aah.

input: and she say (.) I got it [^ repeats 2 times] .
output: and she say (.) I got it .

input: oh no[x 3] .
output: oh  no.


input: xxx [^ /bosolasafiso/]
output: xxx

input: hi [* med]
oupt: hi [* med]

I have used REGEX but no use, I need exact conditions to make all these satisfy and the resultant output should be returned.

All the "INPUTS" are being read from a file so please be noted that even if i use "split()" the words like [^ whee] will be treated as two different words.

I need a condition where only words that contains [/] [* should be retained. other words that starts with "[" should be replaced with an empty string.

5
  • 3
    what regex pattern did you use? Commented Oct 6, 2018 at 5:06
  • I used ([*\s\w*]) and ([\/]) Commented Oct 6, 2018 at 5:11
  • @hjpotter92 And I need help with any regular expression that returns the words that starts exactly with "[" and ends with "]" or "[\" "]" but it may contain any number of words inside "[ ]" or "[\ ]" this pattern Commented Oct 6, 2018 at 5:15
  • You can regex replace [/X] and [*X] with a placeholder of your choice (say, {/X} and {*X}), then replace all [Y] with an empty string, and finally replace the curly braces with square brackets again. Commented Oct 6, 2018 at 5:18
  • @DYZ that's a good idea but if you see these inputs and outputs clearly you can note that [/x] is not a single word it is two words "[" and "x]" so how can this logic be done? that is I have to change "[" to "{*" and the next word will not always be "X" in all the cases, So how to change the next word of the end of the string to '}"? Commented Oct 6, 2018 at 5:49

1 Answer 1

1

The following solution works, assuming that there are no curly braces in your original text. Otherwise, use some other pair of delimiters (e.g., << and >>).

s1 = 'so sh [/] [* med] she had a [^ wheee] .' 

First, replace [ and ] in each [/ X] or [* X] fragment with a { and }, respectively, to protect them from elimination. Then eliminate all survising fragments in square brackets. Finally, replace all curly braces back to square brackets:

re.sub(r"\[[^]]*]", "", # Remove [Y] blocks
        re.sub(r"\[([/*][^]]*)]", r"{\1}", s1)) # Rename [X] to {X}\
  .replace("{", "[") # Restore the original brackets\
  .replace("}", "]")
#'so sh [/] [* med] she had a  .'
Sign up to request clarification or add additional context in comments.

2 Comments

It worked thanks a lot for your help! Can you please explain it or give me a referal link?
It's all about the functions re.sub() and str.replace(). In fact, the latter could be avoided because re.sub() is more universal. The documentation for both functions is available online, you can google it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.