0

I have a chunk of HTML that contains multiple <img> tags. The current format of the tag is:

<img width="580" height="183" src="/images/stories/acidalkalinetable.jpg" alt="acid alkaline table" title="Body pH Balance">

I want to go through the html and for each <img> tag change the format to:

<img width="580" height="183" src="{{media url="wysiwyg/acidalkalinetable.jpg"}}" alt="acid alkaline table" title="Body pH Balance">

You can see it's the src that's changing. I've kept the filename but changed other parts of the src

If the img was a single string I could do something like:

content = '<img width="580" height="183" src="/images/stories/acidalkalinetable.jpg" alt="acid alkaline table" title="Body pH Balance">'

filename = re.search(r'/images/stories/\w+\.(jpg|png|gif)', content)

new_content = re.sub(r'/images/stories/\w+\.(jpg|png|gif)', '{{media url="wysiwyg/' + filename + '"}}', content)

(I haven't tested that)

But I'm not sure how I can do that for each occurrence of the <img> tag in HTML

2
  • Are you sure about the quoting? "{{media url="wysiwyg/acidalkalinetable.jpg"}}", the wsiwyg part is outside the quote. Commented Mar 19, 2013 at 17:33
  • Yeah, I'm cleaning up the data to import into Magento. That's how it does it's image tags Commented Mar 19, 2013 at 17:35

1 Answer 1

2

You need to capture the filename as a group, you can then replace it in one go:

re.sub(r'/images/stories/([\w%]+\.(?:jpg|png|gif))', r'{{media url="wysiwyg/\1"}}', content)

This puts a capturing group ((...)) around the whole filename including the extension (itself now using a non-capturing (?:...) group instead), resulting in:

>>> re.sub(r'/images/stories/([\w%]+\.(?:jpg|png|gif))', r'{{media url="wysiwyg/\1"}}', content)
'<img width="580" height="183" src="{{media url="wysiwyg/acidalkalinetable.jpg"}}" alt="acid alkaline table" title="Body pH Balance">'

This uses \1 as a replacement pattern, see the re.sub() documentation.

This re.sub() call will replace all matching /images/stories/.. paths with the {{media url="wisywig/.."}} syntax.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. Would you be able to update the regex so that is can handle possible spaces in filename ie something%20something.jpg
@iamjonesy: done; all you needed to do was extend the \w to a characterclass that adds % as an option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.