0

I have strings like this:

<img src="http://www.example.com/app_res/emoji/1F60A.png" /><img src="http://www.example.com/app_res/emoji/1F389.png" />
<img src="http://www.example.com/app_res/emoji/1F61E.png" /><img src="http://www.example.com/app_res/emoji/1F339.png" />

I want them to be like this:

&#x1F60A; &#x1F389;
&#x1F61E; &#x1F339;

In Notepad++, I tried this :

Find what: ^\s*<img src="http://www.example.com/app_res/emoji/(1F.*).png" />

Replace with: &#x\1;

The result is not as expected:

&#x1F60A.png" /><img src="http://www.example.com/app_res/emoji/1F389;

How to best isolate the regular expression ?

Any help is welcome ! Thank you

1
  • Stack Overflow is for programming. For seeking help with programs alone Super User should be your prime target. Commented Feb 20, 2022 at 11:18

3 Answers 3

1

You're using the unspecific . together with the greedy star *. Don't do that here, as this tends to overshoot the target.

Be more specific.

The file name (in your case) does not contain dot's. Let's use "anything except a dot" ([^.]*) instead of "anything" (.*):

^\s*<img src="http://www.example.com/app_res/emoji/(1F[^.]*).png" />
Sign up to request clarification or add additional context in comments.

4 Comments

All dots outside the [class] should be escaped.
@AmigoJack Yes, theoretically. For a one-off in an HTML file, the risk of those particular dots matching anything other than actual dots in the OP's file is very, very close to zero.
For your score the answer is sloppy/just a quick job: the regex could have been correct right away, and using apostrophes for plurals is questionable. Putting in a tiny bit Unicode knowledge would make the regex even more robust.
I broke multiple software's necks by using a tabulator where they were always expecting a dot to happen, always due to sloppy regexes. I am not extra searching for typos - I spot them right away when reading. Being concerned about downvotes but still not considering editing the A for the better is proof enough for me you've earned it.
1

You may try the following find and replace, in regex mode:

Find:    <img src=".*?/([A-Z0-9]+\.\w+"\s*/><img src=".*?/([A-Z0-9]+\.\w+"\s*/>
Replace: &#x$1; &#x$2;

Here is a working regex demo.

3 Comments

That regex is just doubled - why not incorporating repetitions right away?
@AmigoJack Good catch. I figured that perhaps in the general case they are not the same. Anyway, we need two separate capture groups, right?
True. Most likely OP wants to convert any occurrence and not only lines of pairs. Putting spaces between what have not been spaces is already a questionable manipulation in semantic terms.
0

Try

Find:^<.*?/(1\w+).*?/(1\w+).* Replace:&#x$1; &#x$2;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.