-2

I have a string like below:

MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^
MSA|AA|1234

I want to use regex to replace everything between K11| and |P. The string between these changes. I thought this was straight forward enough but I cant get it to work.

I have tried var regEx5 = /K11\|\w*\|P/g then using that string to replace the text. The regex is bringing back QPC0amoCHidY though. I cant understand why it is doing this. Is it because the string contains + symbol? Im at a loss.

Also tried /K11\|[^|]*\|P/g and /K11\|(.*?)\|P/g with no joy

Code that is doing the regex and the replace:

var regEx5 = /K11\|([^|]+)\|P/g 
newText1 = newText1["replace"](regEx5, "K11|<IGNORE>|P");
12
  • 2
    What language are you using ? Because your 2nd and 3rd attempts work with the JavaScript regex engine. Commented Sep 22, 2023 at 9:23
  • 2
    And as it seems like a XY problem: your test string looks like a CSV line with pipe separators, are you sure that you need to extract values between K11| and |P but not just extracting the nth column of that CSV string ? Commented Sep 22, 2023 at 9:26
  • 1
    ok, but the 3rd attempt (/K11\|(.*?)\|P/g) works on all flavors listed on regex101, including ".NET (C#)". Maybe try without the ?: K11\|(.*)\|P ? Commented Sep 22, 2023 at 9:41
  • 1
    Are you using any code? And if you do, can you add it to the question? Commented Sep 22, 2023 at 10:21
  • 1
    @Endorium If you are using C#, then why not use Regex.Replace ? See ideone.com/9UwCu0 Commented Sep 22, 2023 at 13:23

1 Answer 1

0

To replace a string that occurs between two other strings, a common approach is to capture the two bounding strings and then the replacement expression puts back the two captured strings with the new wanted text in the middle.

Using the RegEx (K11\|).*(\|P) captures the K11| and the |P in groups 1 and 2. The text between them is matched by the .* but it is not captured.

The question is not clear on what the replacement should be, so lets assume that it is NewText.

The replacement expression should then be \1NewText\2 or $1NewText$2 depending on the exact RegEx version being used.

C# code to perform the change could be as follows. Note that the backslash characters in the strings need to be doubled when putting them the C# strings.

string source = "MSH|^~\\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^";
string regex = "(K11\\|).*(\\|P)";
string replace = "$1NewText$2";
string output = Regex.Replace(source, regex, replace);

Console.WriteLine($"Was: '{source}'");
Console.WriteLine($"Now: '{output}'");

The output from this code is:

Was: 'MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^'
Now: 'MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|NewText|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^'

A comment on the question states that

K11\|(.*)\|P still returns QPC0amoCHidY

Where the text QPC0amoCHidYis part of the string between K11| and |P. In this ReGex the text that is captured is the text the should be replaced, the original K11| and |P are thus lost. I do not know why the rest of the text between the two strings (i.e. the +2uSHidYKB+Q) does not appear, but I suspect that something extra is being done in the code.

Sign up to request clarification or add additional context in comments.

2 Comments

There isn't anything extra been done in the code. Its as simple as the piece I added to the questions. With one + in the strong it works fine. But as soon as the string contains two + symbols, it doesn't work. For instance qfwf+fwef+ewfw will not work. qfwf+fwef works fine. Its as if regex is seeing the 2nd + as a special character
@Endorium The two lines of code shown at the bottom of the question do not make any sense to me. The types of the various parts are not shown. The line var regEx5 = /K11\|([^|]+)\|P/g is not C# code as per the tags on the question. It is also hard to understand the line newText1 = newText1["replace"](regEx5, "K11|<IGNORE>|P"); as C# code. I am sure that there is more to the code being used than is shown in the question. Just showing two lines from a larger piece of code is not sufficient.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.