Replace a string using Notepad++ and regex

Question

I have strings like this:

<img src="http://www.example.com/app_res/emoji/1F60A.png" /><img src="http://www.example.com/app_res/emoji/1F389.png" />
<img src="http://www.example.com/app_res/emoji/1F61E.png" /><img src="http://www.example.com/app_res/emoji/1F339.png" />

I want them to be like this:

&#x1F60A; &#x1F389;
&#x1F61E; &#x1F339;

In Notepad++, I tried this :

Find what: ^\s*<img src="http://www.example.com/app_res/emoji/(1F.*).png" />

Replace with: &#x\1;

The result is not as expected:

&#x1F60A.png" /><img src="http://www.example.com/app_res/emoji/1F389;

How to best isolate the regular expression ?

Any help is welcome ! Thank you

Stack Overflow is for programming. For seeking help with programs alone Super User should be your prime target. — AmigoJack
– AmigoJack, Commented Feb 20, 2022 at 11:18

Tomalak · Accepted Answer · 2022-02-20 10:37:42Z

1

You're using the unspecific . together with the greedy star *. Don't do that here, as this tends to overshoot the target.

Be more specific.

The file name (in your case) does not contain dot's. Let's use "anything except a dot" ([^.]*) instead of "anything" (.*):

^\s*<img src="http://www.example.com/app_res/emoji/(1F[^.]*).png" />

answered Feb 20, 2022 at 10:37

Tomalak

339k68 gold badges547 silver badges635 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

AmigoJack Over a year ago

All dots outside the [class] should be escaped.

Tomalak Over a year ago

@AmigoJack Yes, theoretically. For a one-off in an HTML file, the risk of those particular dots matching anything other than actual dots in the OP's file is very, very close to zero.

AmigoJack Over a year ago

For your score the answer is sloppy/just a quick job: the regex could have been correct right away, and using apostrophes for plurals is questionable. Putting in a tiny bit Unicode knowledge would make the regex even more robust.

AmigoJack Over a year ago

I broke multiple software's necks by using a tabulator where they were always expecting a dot to happen, always due to sloppy regexes. I am not extra searching for typos - I spot them right away when reading. Being concerned about downvotes but still not considering editing the A for the better is proof enough for me you've earned it.

Tim Biegeleisen · Accepted Answer · 2022-02-20 10:38:46Z

1

You may try the following find and replace, in regex mode:

Find:    <img src=".*?/([A-Z0-9]+\.\w+"\s*/><img src=".*?/([A-Z0-9]+\.\w+"\s*/>
Replace: &#x$1; &#x$2;

Here is a working regex demo.

answered Feb 20, 2022 at 10:38

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

3 Comments

AmigoJack Over a year ago

That regex is just doubled - why not incorporating repetitions right away?

Tim Biegeleisen Over a year ago

@AmigoJack Good catch. I figured that perhaps in the general case they are not the same. Anyway, we need two separate capture groups, right?

AmigoJack Over a year ago

True. Most likely OP wants to convert any occurrence and not only lines of pairs. Putting spaces between what have not been spaces is already a questionable manipulation in semantic terms.

Haji Rahmatullah · Accepted Answer · 2022-02-20 15:08:36Z

0

Try

Find:^<.*?/(1\w+).*?/(1\w+).* Replace:&#x$1; &#x$2;

answered Feb 20, 2022 at 15:08

Haji Rahmatullah

4301 gold badge4 silver badges11 bronze badges

Collectives™ on Stack Overflow

Replace a string using Notepad++ and regex

3 Answers 3

4 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related