0

I have a string like this in a file

<script>
Evening</script>

I have written a code to replace this string but it's not identifying the newline character i,e. I want to replace above string with:

<h1>Done</h1>

code goes like this:

package stringreplace;
import java.io.*;

import org.omg.CORBA.Request;

public class stringreplace {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        FileReader fr = null;
        BufferedReader br = null;

        try
        {
             fr = new FileReader("G://abc.html");
             br = new BufferedReader(fr);

             String newtext="";
             String line="";

             String matchExist1 = "<script>\r\nEvening</script>";
             String newpattern = "<h1>Done</h1>";

             String matchExist2 = "</body>";
             String newpattern2 = "<script>alpha</script></body>";

             StringBuffer sb = new StringBuffer();

             while((line=br.readLine())!=null)
             {
                int ind2 = line.indexOf(matchExist1);
                System.out.println(ind2);
                int ind3 = line.indexOf(matchExist2);
                if((ind2==-1) || (ind3==-1))
                 {
                    line = line.replaceFirst(matchExist1,newpattern);
                    line = line.replaceFirst(matchExist2,newpattern2);
                    sb.append(line+"\n");   
                 }
                //sb.append(line+"\n");
                else if((ind2!=-1) || (ind3!=-1))
                 {
                    String tag = "</body>";
                    line = line.replaceFirst("</body>",tag);
                    sb.append(line+"\n");
                 }
            }
             br.close();

             FileWriter fw = new FileWriter("G://abc.html");
             fw.write(sb.toString());
             fw.close();

             System.out.println("done");
             System.out.println(sb);

        }
    catch (Exception e)
        {
         System.out.println(e);
        }

    }

}

But it is not identifying newline character.

4
  • 2
    Slightly tangential - if all you are trying to do is parse HTML, why can't you just use a XML parser (provided its XHTML)? In my experience, writing a good regex-based parser for HTML is not worth the time. Commented Feb 22, 2012 at 7:32
  • are you sure, it's "\r\n" and not just a single "\n"? Commented Feb 22, 2012 at 7:33
  • As Pavan mentioned, and by my experience, I recommend jsoup.org. Commented Feb 22, 2012 at 8:01
  • Completely tangential: why import org.omg.CORBA.Request? Commented Feb 22, 2012 at 8:03

2 Answers 2

3

Since you are reading only one input line at a time you can hardly expect to match a pattern that spans two lines.You must first fix your read to have a least two lines in it. Once you've done that, @sterna's answer will do the trick

Sign up to request clarification or add additional context in comments.

1 Comment

+1, you have seen more. But the part about the Mutliline flag is wrong, it does just change the behaviour of the ^ and $ anchors. What you probably meant was the DOTALL flag, that makes the dot also matches newline characters. But this also not his problem here.
1

I think you can't be sure about how your newline looks like. So I would not match for a specific sequence instead use \s+ this is at least one whitespace character and all newline characters are included.

String matchExist1 = "<script>\\s+Evening</script>";

Edit:
Of course, you have to fix at first the problem mgc described (+1). And then you can make use of my answer!

1 Comment

line.indexOf(matchExist1) won't match on a regex

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.