2

I have following log file from server,I want to extract xml from following string.

2:00:11 PM >>Response: <?xml version="1.0" encoding="UTF-8"?>

<HotelML xmlns="http://www.xpegs.com/v2001Q3/HotelML"><Head><Route Destination="TR" Source="00"><Operation Action="Create" App="UltraDirect-d1c1_" AppVer="V1_1" DataPath="/HotelML" StartTime="2013-07-31T08:33:13.223+00:00" Success="true" TotalProcessTime="711"/></Route>............

</HotelML>


3:00:11 PM >>Response: <?xml version="1.0" encoding="UTF-8"?>

<HotelML xmlns="http://www.xpegs.com/v2001Q3/HotelML"><Head><Route Destination="TR" Source="00"><Operation Action="Create" App="UltraDirect-d1c1_" AppVer="V1_1" DataPath="/HotelML" StartTime="2013-07-31T08:33:13.223+00:00" Success="true" TotalProcessTime="711"/></Route>............

</HotelML>

5:00:11 PM >>Response: <?xml version="1.0" encoding="UTF-8"?>

<HotelML xmlns="http://www.xpegs.com/v2001Q3/HotelML"><Head><Route Destination="TR" Source="00"><Operation Action="Create" App="UltraDirect-d1c1_" AppVer="V1_1" DataPath="/HotelML" StartTime="2013-07-31T08:33:13.223+00:00" Success="true" TotalProcessTime="711"/></Route>............

</HotelML>

I have written following regular expression for the same but it's matching only the first entry in the string.but i want to return all the xml string as collection.

(?<= Response:).*>.*</.*?>
4
  • all your exemple log rows starts with "Response: <?xml" and ends with "</HotelML>", is that true for all log rows or just the exemple? Commented Jul 31, 2013 at 11:45
  • @Puggan Se:Yes, you are right Commented Jul 31, 2013 at 11:46
  • Is there any reason that you cannot just assume that the content after 'Response: ' represent an xml document? From there, just run it through a schema to validate and then load it as normal. Commented Jul 31, 2013 at 11:49
  • @PugganSe, actually, once you found the "Response" line, it is easy to extract the encoding with a regex... ".*encoding="(.*)".*" and the first group contains the encoding. If you want to be defensive about it... Commented Jul 31, 2013 at 11:50

2 Answers 2

2

why aren't you matching from <HotelML to </HotelML?

something like:

<HotelML .*</HotelML>

Or, just go through the file line by line, and whenever you find a line matching

^.* PM >>Response:.*$

read the following lines as xml until the next matching line...

Sign up to request clarification or add additional context in comments.

4 Comments

If OP can guarantee that <?xml version="1.0" encoding="UTF-8"?> never changes then this would be a better option than trying to remove Response.
You should use the encoding of the log file, the <?xml version="1.0" encoding="UTF-8"?> does only say something about the encoding of the message.
But that information may still be relevant to the program OP is working on.
It's matching perfectly for single node but for multiple node it's not working
1

Here's another approach which should leave you with a List<XDocument>:

using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {

        var input = File.ReadAllText("text.txt");
        var xmlDocuments = Regex
            .Matches(input, @"([0-9AMP: ]*>>Response: )")
            .Cast<Match>()
            .Select(match =>
                {
                    var currentPosition = match.Index + match.Length;
                    var nextMatch = match.NextMatch();
                    if (nextMatch.Success == true)
                    {
                        return input.Substring(currentPosition,
                            nextMatch.Index - currentPosition);
                    }
                    else
                    {
                        return input.Substring(currentPosition);
                    }
                })
            .Select(s => XDocument.Parse(s))
            .ToList();
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.