Delete all xml tags between strings in string with python and regexp

Question

I have string:

...<w:t> Name</w:t></w:r><w:r><w:rPr><w:rFonts w:ascii="Cambria" w:hAnsi="Cambria"/><w:b/><w:sz w:val="28"/><w:szCs w:val="28"/></w:rPr><w:t>:</w:t></w:r><w:r><w:rPr></w:rPr><w:t xml:space="preserve"> </w:t></w:r><w:r><w:rPr><w:b/><w:bCs/></w:rPr><w:t>{{</w:t></w:r><w:r><w:rPr></w:rPr><w:t xml:space="preserve"> </w:t></w:r><w:r><w:rPr><w:i/><w:iCs/></w:rPr><w:t>test</w:t></w:r><w:r><w:rPr><w:i/><w:iCs/></w:rPr><w:t>.name</w:t></w:r><w:r><w:rPr></w:rPr><w:t xml:space="preserve"> </w:t></w:r><w:r><w:rPr><w:b/><w:bCs/></w:rPr><w:t>}} <w:t>....

And i need script which will delete all tags(<...>) between {{ and }} But do not delete between pairs of characters, e.g:

The result of:
{{ <wr> test.name1 <wr> }} <wr><wr> {{ <wr> test.name2 <wr> }} 
will be:
{{ test.name1 }} <wr><wr> {{ test.name2 }} 
not:
{{ test.name1 }} {{ test.name2 }}

Thank you in advance!

What have you tried so far? There are numerous websites that allow you to test your regex pretty quickly. E.g. regexr.com or freeformatter.com/regex-tester.html — jDo
– jDo, Commented Mar 10, 2016 at 11:46

Eugene Lisitsky · Accepted Answer · 2016-03-10 12:04:38Z

1

If you don't need single regular expression, you may combine substitutions:

    import re
    s='{{ <wr> test.name1 <wr> }} <wr><wr> {{ <wr> test.name2 <wr> }}'
    re.sub(r'({{[^{}]+}})', lambda x: re.sub(r'<[a-zA-Z0-9:-]+>', '', x.group(0)), s)
    '{{  test.name1  }} <wr><wr> {{  test.name2  }}'

answered Mar 10, 2016 at 12:04

Eugene Lisitsky

13k6 gold badges42 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

icdu91 Over a year ago

thx! But that solution do not work for string from question :(

MajorTom · Accepted Answer · 2016-03-10 12:02:46Z

0

You can do like this:

import re

TAG_RE = re.compile(r'\{\{(\s*<[^>]+>|\s*)(\s*.*?\s*)(<[^>]+>\s*|\s*)\}\}')

def remove_tags2(text):
    return TAG_RE.sub('{{ \g<2> }}', text)

remove_tags2("sdfgsd {{ <wr> blablalba sdf asf asga sfas asd </wr> }} <wr><wr> {{<wr>alsdfhaksdhfkajg<wr>}}")

Output:

'sdfgsd {{  blablalba sdf asf asga sfas asd  }} <wr><wr> {{ alsdfhaksdhfkajg }}'

answered Mar 10, 2016 at 12:02

MajorTom

3853 silver badges6 bronze badges

1 Comment

icdu91 Over a year ago

thx! But that solution do not work for string from question :(

icdu91 · Accepted Answer · 2016-03-10 17:38:52Z

0

Based on Eugene answer:

import re

s='...<w:t> Name</w:t></w:r><w:r><w:rPr><w:rFonts w:ascii="Cambria" w:hAnsi="Cambria"/><w:b/><w:sz w:val="28"/><w:szCs w:val="28"/></w:rPr><w:t>:</w:t></w:r><w:r><$

print re.sub(r'({{[^{}]+}})', lambda x: re.sub(r'<[^>]+>', '', x.group(0)), s)

output:

...<w:t> Name</w:t></w:r><w:r><w:rPr><w:rFonts w:ascii="Cambria" w:hAnsi="Cambria"/><w:b/><w:sz w:val="28"/><w:szCs w:val="28"/></w:rPr><w:t>:</w:t></w:r><w:r><w:rPr></w:rPr><w:t xml:space="preserve"> </w:t></w:r><w:r><w:rPr><w:b/><w:bCs/></w:rPr><w:t>{{ test.name }} <w:t>....

answered Mar 10, 2016 at 17:38

icdu91

336 bronze badges

Collectives™ on Stack Overflow

Delete all xml tags between strings in string with python and regexp

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related