2

I have a Java String variable containing HTML in which I want to replace all the names of PNG images by another name.

Example input HTML

<html>
  <head>
    <link rel="stylesheet" media="screen" href="style.css"/>
  </head>
  <body>
    <img href="test1.png" />
    <img href="test2.png" />
  </body>
</html>

Typical output HTML should be

<html>
  <head>
    <link rel="stylesheet" media="screen" href="style.css"/>
  </head>
  <body>
    <img href="C:\foo\bar\test1.png" />
    <img href="C:\foo\bar\test2.png" />
  </body>
</html>

Currently I have this Java code that provides me the new name by loading the image as a ressource. However I can't find the good regex to select all (and only) the images names (with extension but without quotes), can anyone help me on that ?

Pattern imagePattern = Pattern.compile(" TODO ");
Matcher imageMatcher = imagePattern.matcher(taskHTML);

while (imageMatcher.find())
{
    String oldName = imageMatcher.group(1);
    String newName = "" + getClass().getResource("/images/" + imageMatcher.group(1));

    taskHTML.replace(oldName, newName);
}

The matcher should list the following elements:

[test1.png, test2.png]
5
  • 2
    I suggest better use HTML parser for this . Commented Mar 24, 2015 at 10:42
  • @AchintyaJha Could you detail please ? Do you have any link that provides more details about an HTML parser ? Commented Mar 24, 2015 at 10:48
  • 1
    You can google and find many ( jsoup.org) Commented Mar 24, 2015 at 10:51
  • just google it Commented Mar 24, 2015 at 10:51
  • HTML parser seems a bit overkill for my needs, plus it adds a dependency in my project. Commented Mar 24, 2015 at 13:25

4 Answers 4

1

Like others have mentioned, I suggest you use an HTML parser like JSoup.

Usage:

import org.jsoup.nodes.*;
import org.jsoup.select.Elements;
import org.jsoup.Jsoup;

public class Parse {

    public static void main(String[] args) {
        String webPage = "<img href=\"test1.png\" /><img href=\"test2.png\" />"; //your HTML

        Document doc = Jsoup.parse(webPage);

        Elements imgLinks = doc.select("img[href]"); //grabs all imgLinks

        //for every <img> link
        for(Element link : imgLinks){           
            String imageName = link.attr("href"); //grab current href (your image name)
        link.attr("href", "C:\\foo\\bar\\" + imageName); //replace current href with the dir + imageName

        }
        System.out.println(doc.html()); //print modified HTML
    }
} 

Output:

<html>
    <head>
        <link rel="stylesheet" media="screen" href="style.css">
    </head>
    <body>
        <img href="C:\foo\bar\test1.png"> 
        <img href="C:\foo\bar\test2.png">
    </body>
</html>

If you have a local HTML file that you want to parse, you will want to replace the doc above with this:

File in = new File(input);
Document doc = JSoup.parse(in, null);

Or if you want to directly connect to a page you can replace it with this:

Document doc = Jsoup.connect("http://stackoverflow.com/").get();

Note: You will need to add JSoup to your buildpath

Sign up to request clarification or add additional context in comments.

2 Comments

Useful but I ended up using a regexp (see my answer).
@Spotted glad you found a solution! Just a little pointer, it's not advised to use regex to parse html. For reasons read this. Anyway, don't forget to accept an answer to close the question :)
0

try this

str = str.replaceAll("href=\"(.*?)\"", "href=\"" + dir.replace("\\", "\\\\") + "$1\"");

Comments

0

Whether you need to modify HTML content consider using XSLT instead of REGEXP.

Comments

0

I ended up using the following regular expression:

Pattern.compile("\\\"(.+\\.png)\\\"");

And accessing the match between the quotes by getting the second element of each match (the first is the string with the quotes):

matcher.group(1);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.