5

I have a need to replace the word 'OR' with '||' in a given string. It should be replaced only when it is a complete word by itself in the input string. Also, it shouldn't be replaced if it is appearing within quotes. For e.g., if the input string is

application.path="EXCEL.exe" OR application.path="EXCELSIOR.exe" OR application.path="XYZ OR ABC.exe"

the output should be

application.path="EXCEL.exe" || application.path="EXCELSIOR.exe" || application.path="XYZ OR ABC.exe"

Note that the OR in EXCELSIOR.exe and "XYZ OR ABC.exe" is not replaced.

The Java code I'm using is as follows:

String inputStr = "(quote.AGE was 24 AND (application.path = \"**\\acad.exe\" OR application.path = \"**\\dxfdwg.exe\" OR application.path = \"**\\EXCELSIOR.EXE\" OR application.path = \"**\\iges.exe\" OR application.path = \"**\\notepad.exe\" OR application.path = \"**\\run_journal.exe\" OR application.path = \"**\\AcroRd32.exe\" OR application.path = \"**\\dllhost.exe\" OR application.path = \"**\\powerpnt.exe\" OR application.path = \"**\\Edge.exe\" OR application.path = \"**\\step203ug.exe\" OR application.path = \"**\\step214ug.exe\" OR application.path = \"**\\VisView.exe\" OR application.path = \"**\\Teamcenter.exe\" OR application.path = \"**\\ug_convert_part.exe\" OR application.path = \"**\\ugraf.exe\" OR application.path = \"**\\ugtopv.exe\" OR application.path = \"**\\wmplayer.exe\" OR application.path = \"**\\winword.exe\" OR application.path = \"**\\wordpad.exe\" OR application.path = \"**\\vlc.exe\" OR application.path = \"**\\dwgviewr.exe\" OR application.name = \"RMS\" OR application.path = \"**\\acrobat.exe\" OR application.path = \"**\\Alias.exe\" OR application.path = \"**\\awtessd.exe\" OR application.path = \"**\\proe.exe\" OR application.path = \"**\\STPViewer.exe\" OR application.path = \"**\\gom_inspect.exe\" OR application.path = \"**\\gom_cad_server2.exe\" OR application.path = \"**\\sldworks.exe\" OR application.path = \"**\\sldworks_fs.exe\" OR application.path = \"**\\sldProcMon.exe\" OR application.path = \"**\\AdapplicationMgr.exe\" OR application.path = \"**\\AdapplicationMgrSvc.exe\" OR application.path = \"**\\SE3Dtrans.exe\" OR application.path = \"**\\stamp.exe\" OR application.path = \"**\\psolid.exe\" OR application.path = \"**\\mpid.exe\" OR application.path = \"**\\mpirun.exe\" OR application.path = \"**\\FS.exe\" OR application.path = \"**\\xtop.exe\" OR application.path = \"**\\pro_comm_msg.exe\" OR application.path = \"**\\nmsd.exe\" OR application.path = \"**\\creoagent.exe\" OR application.path = \"**\\parametric.exe\" OR application.path = \"**\\PDFEditor.exe\" OR application.path = \"**\\CNEXT.exe\" OR application.path = \"**\\drafter.exe\" OR application.path = \"**\\convert.exe\" OR application.path = \"**\\ActCut3D.exe\" OR application.path = \"**\\ppcbasic.exe\" OR application.path = \"**\\deltamesh_stamping.exe\" OR application.path = \"Xasfsf\" OR application.path = \"sfdsdf\"))";
String replacedStr = inputStr.replaceAll("(?m)\\bOR\\b(?=(?:\"[^\"]*\"|[^\"])*$)", "||");

This works fine for shorter strings, but once the length goes beyond 2000 characters, it throws the following error:

Exception in thread "main" java.lang.StackOverflowError at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3796) at java.util.regex.Pattern$Branch.match(Pattern.java:4604) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658) at java.util.regex.Pattern$Loop.match(Pattern.java:4785) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717) at java.util.regex.Pattern$BranchConn.match(Pattern.java:4568) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3777) at java.util.regex.Pattern$Branch.match(Pattern.java:4604)

I read in some other threads(thread1, thread2) that Java doesn't handle regex for long strings very well. Can someone suggest how I can improve my regex to avoid the StackOverflowError?

1
  • Would String.replaceAll() work for you? Commented May 9, 2018 at 14:11

1 Answer 1

1

Can someone suggest how I can improve my regex to avoid the StackOverflowError?

Yes I can gives you two solutions, you just need to see your problem from another side.

Here is a quick analyse about your problem and a quick solution, you can use this regex instead (.*?\"\s+)\bOR\b(\s+application.*?) :

Solution one

String inputStr = //that long String
String regex = "(.*?\"\\s+)\\bOR\\b(\\s+application.*?)";
String replacedStr = inputStr.replaceAll(regex, "$1||$2");

System.out.println(replacedStr);

I notice that the OR you want to replace exist after " ans space OR the application, my regex will match that OR and replace it.

Output for the short example, it will gives you the same result for the long one :

application.path="EXCEL.exe" || application.path="EXCELSIOR.exe" || application.path="XYZ OR ABC.exe"
                             ^^                          ^^      ^^                       ^^

Solution two

If you are using Java 9+ you can use this regex application.path=(\"(.*?)\"), to match every thing like application.path="something here", the collect the result with ||

String regex = "application.path=(\"(.*?)\")";
String text = Pattern.compile(regex)
        .matcher(inputStr).results().map(MatchResult::group)
        .collect(Collectors.joining(" || "));
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the solutions @YCF_L

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.