1

I have a String in the following format: -----BEGIN MESSAGE-----, followed by a variable length encrypted session key, followed by a newline, followed by an encrypted message, followed by a newline, followed by a digital signature, followed by -----END MESSAGE-----.

-----BEGIN MESSAGE-----
SNyeWtz8QD8AKdioMG11wu7U6gG2wD9tekvVrx6VYW+6oJj4Wl8NE+7i5MHbu4Au
+vN1Z886lOWka7ekgPF8N7t9MpiFo2pBPHuFcOsaY5ETYuEyk5gaX7BYP7qT6wKG
BRILmX6DblWqGxG2tKs/AdcHDqQ5QBXrP03uhN68wgo=

U2FsdGVkX18gtpQSqyH4H5242SZzcZrb0oH7FWw7/MSCxo7h7BVaesZV2N38sr9y

kVr+wabiNn4RfAB4nNi9gAZHQLok4uxRMALGF2kZk2zpVNPQo6jcdz85fy68gylX
OCQIIdk8JPIwxzHfVvRZqNHDRADZRlNHUMYScjRPU+DB8avghYAVKMJhLgA/2Tdp
a59uBMBg/yB1yqA5FivxPzOhq92Y4nZuP1R9/yGE9O8K
-----END MESSAGE-----

What is the best way to parse out the three pieces of information (the session key, encrypted message, and digital signature)?

I tried using the Scanner class but I coudln't figure out what to use as the delimeter. I also tried using the Pattern class, but couldn't figure that method out either. Thank you!

2
  • I just did something similar. The question is, do you want those three pieces of data in one match? With 3 capture groups? Or in 3 seperate matches? Commented Apr 10, 2013 at 0:25
  • Suamere, I want 3 separate matches. Jaynathan, I tried using "\n" as a delimeter, but it wouldn't work because every line is followed by a newline. For example, the encrypted session key is 3 lines long, each line followed by a newline. I even tried using "\n\n" as a delimeter, but that failed as well. Commented Apr 10, 2013 at 0:30

5 Answers 5

1

You actually have newlines embedded in the various parts. What delimits them is the blank line—two newlines in a row. I assume you want each part with the line breaks removed. I'd suggest a brute force approach:

StringBuilder sb = new StringBuilder();
String[] parts = input.split("\\r?\\n\\r?\\n"); // should be 3 long
// strip out header and newlines from session key
String[] lines = parts[0].split("\\r?\\n");
for (int i = 1; i < lines.length; ++i) { // skip first line
    sb.append(lines[i]);
}
parts[0] = sb.toString();
// strip out header and newlines from message
sb.setLength(0);
lines = parts[1].split("\\r?\\n");
for (int i = 0; i < lines.length; ++i) {
    sb.append(lines[i]);
}
parts[1] = sb.toString();
// finally, deal with the signature
sb.setLength(0);
lines = parts[2].split("\\r?\\n");
for (int i = 0; i < lines.length - 1; ++i) {
    sb.append(lines[i]);
}
parts[2] = sb.toString();

Not elegant, but it makes clear what's happening.

An alternative approach would be to use a Scanner to read each line and decide what to do with it. Three lines—the header, the trailer, and a blank line—would have special treatment and affect the processing. Otherwise just append each line as you read it to a StringBuffer.

Sign up to request clarification or add additional context in comments.

6 Comments

I was hoping there would be a way of solving this with regex, but your method most definitely works. Thank you!
@Andy - There probably is a way using regex, but I was too lazy to work one out. :)
A note for the OP and Ted. In Regex, the best way to get linebreaks is [\r\n]+ This means any number combination of either linefeed or carriage return in any order one or more times. It's much cleaner than \r?\n\r?\n, or the "classic" (\r*\n\r*|\n*\r\n*)+. But even cleaner is \s+, especially in this case where there are no spaces within each string, only spaces between them.
@Suamere - Good points. However, for my code, the first split must be only on two consecutive line breaks. The single line breaks must not match; however, [\r\n]+ (or \s+) will match a single line break. It won't help to make it [\r\n]{2} (or \s\s) because \r\n will match and you can't make it [\r\n]{4} (or \s{4}) because \r or \n may be missing from the line terminators (depending on the server). For the second splitting, \s+ is perfect. I used \r?\n out of inertia.
All of your statements are correct. I only have my thought pattern because as far as I know, there are only single linebreaks for formatted viewing. When parsing an actual cert, if you don't count the Begin and End lines, any whitespace at all separates pieces of data. But I think your answer is more literal toward the example the OP gave, so I can't argue with you there.
|
1

Right, Remove the Begin and End like Sergii said. Then do a Regex Split against "\s+" e.g. in .NET:

Regex.Split(Regex.Replace(strCert, "(?i)\s*-{5}(BEGIN|END)\sMESSAGE-{5}\s*", ""), "\s+")

That is, assuming the only reason your example has the single-linebreaks within the body of each data is for formatting, because as far as I know, those don't exist in the actual cert. The actual cert would look like:

-----BEGIN MESSAGE-----
SNyeWtz8QD8AKdioMG11wu7U6gG2wD9tekvVrx6VYW+6oJj4Wl8NE+7i5MHbu4Au+vN1Z886lOWka7ekgPF8N7t9MpiFo2pBPHuFcOsaY5ETYuEyk5gaX7BYP7qT6wKGBRILmX6DblWqGxG2tKs/AdcHDqQ5QBXrP03uhN68wgo=

U2FsdGVkX18gtpQSqyH4H5242SZzcZrb0oH7FWw7/MSCxo7h7BVaesZV2N38sr9y

kVr+wabiNn4RfAB4nNi9gAZHQLok4uxRMALGF2kZk2zpVNPQo6jcdz85fy68gylXOCQIIdk8JPIwxzHfVvRZqNHDRADZRlNHUMYScjRPU+DB8avghYAVKMJhLgA/2Tdpa59uBMBg/yB1yqA5FivxPzOhq92Y4nZuP1R9/yGE9O8K
-----END MESSAGE-----

Ya?

3 Comments

This will split at every line break, including those line breaks embedded in each part of the text. OP needs to separate first on blank lines (two consecutive line terminator sequences). For robustness, it needs to work for all varieties of line terminator sequences: \r\n (Windows, HTTP standard), \r (Mac), or \n (Unix).
Not true. \s+ will gather one or more whitespace as one. Which means linebreaks and possible spaces on those "empty" lines. So if the Begin and End were manually removed, all that would be left is a collection of the three areas. \s+ will work for all varieties of terminator sequences. The only reason \s+ may not work is if one of those lines of information included whitespace, which according to the rules of certs, never would. But if there also would never be whitespace on the "empty" line, [\r\n]+ would be good too, but \s+ is still more "robust"
Good point that line breaks should be absent from the cert, message, and signature. I'm not so sure that this requirement applies to the message, however. Also good point about using \s+ to be robust against white space in the supposedly blank lines between the parts.
0

newline ?

And delete -----BEGIN MESSAGE----- from first value and -----END MESSAGE----- from last value.

1 Comment

I agree. If you're just dealing with one certificate and there are three lines. Just replace the begin/end message with nothing, then string or regex split by \s+ (I don't believe there are spaces within each string, just between the strings.
0

Code:

public class MessageParser {

   public static void main( String[] args ) {
      String message =
         "-----BEGIN MESSAGE-----\n" +
         "SNyeWtz8QD8AKdioMG11wu7U6gG2wD9tekvVrx6VYW+6oJj4Wl8NE+7i5MHbu4Au\n" +
         "+vN1Z886lOWka7ekgPF8N7t9MoiFo2pBPHuFcOsaY5ETYuEyk5gaX7BYP7qT6wKG\n" +
         "BRILmX6DblWqGxG2tKs/AdcHDqQ5QBXrP03uhN68wgo=\n" +
         "\n" +
         "U2FsdGVkX18gtpQSqyH4H5242gZzcZrb0oH7FWw7/MSCxo7h7BVaesZV2N38sr9y\n" +
         "\n" +
         "kVr+wabiNn4RfAB4nNi9gAZHQLok4uxRMALGF2kZk2zpVNPQo6jcdz85fy68gylX\n" +
         "OCQIIdk8JPIwxzHfVvRZqNHDRFDZRlNHUMYScjRPU+DB8avghYAVKMJhLgA/2Tdp\n" +
         "a59uBMBg/yB1yqA5FivxPzOhq92Y4nZuP1R9/yGE9O8K\n" +
         "-----END MESSAGE-----\n";
      String[] lines = message.split( "\n" );
      int i = 1;
      String sessionKey = "";
      String line = lines[i];
      while( i < lines.length && line.length() > 0 ) {
         sessionKey += line;
         line = lines[++i];
      }
      String encryptedMessage = "";
      line = lines[++i];
      while( i < lines.length && line.length() > 0 ) {
         encryptedMessage += line;
         line = lines[++i];
      }
      String digitalSignature = "";
      line = lines[++i];
      while( i < lines.length && ! line.equals( "-----END MESSAGE-----" )) {
         digitalSignature += line;
         line = lines[++i];
      }
      System.out.println( "sessionKey      : " + sessionKey );
      System.out.println( "encryptedMessage: " + encryptedMessage );
      System.out.println( "digitalSignature: " + digitalSignature );
   }
}

Output:

sessionKey      : SNyeWtz8QD8AKdioMG11wu7U6gG2wD9tekvVrx6VYW+6oJj4Wl8NE+7i5MHbu4Au+vN1Z886lOWka7ekgPF8N7t9MoiFo2pBPHuFcOsaY5ETYuEyk5gaX7BYP7qT6wKGBRILmX6DblWqGxG2tKs/AdcHDqQ5QBXrP03uhN68wgo=
encryptedMessage: U2FsdGVkX18gtpQSqyH4H5242gZzcZrb0oH7FWw7/MSCxo7h7BVaesZV2N38sr9y
digitalSignature: kVr+wabiNn4RfAB4nNi9gAZHQLok4uxRMALGF2kZk2zpVNPQo6jcdz85fy68gylXOCQIIdk8JPIwxzHfVvRZqNHDRFDZRlNHUMYScjRPU+DB8avghYAVKMJhLgA/2Tdpa59uBMBg/yB1yqA5FivxPzOhq92Y4nZuP1R9/yGE9O8K

1 Comment

You forced \n's into the example in order to parse by them, though.
0
String[] parts = string.split("\r?\n");
sessionKey = parts[1];
encryptedMessage = parts[3]; 
digitalSignature = parts[5]; 

The \r? allows Windows EOLs (\r\n) or Unix EOLs (\n).

2 Comments

Newlines can't always be relied upon to be \r\n. \s+ would be perfect with the assertion that there are no spaces within each part itself.
@Suamere - Each part has newlines in it. What distinguishes the parts is the blank lines.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.