2

I have the following xml.

string xmlstring= <z:row ows_Article_x0020_Tags='14;#cricket;#21;#Headlines;#19;#Videos' ows__ModerationStatus='0'      ows__Level='1' ows_Last_x0020_Modified='9;#2013-11-26 01:33:01' ows_ID='9' ows_UniqueId='9;#{FEA534D1-F63B-464D-97DE-     AC60798B72D6}' ows_owshiddenversion='9' ows_FSObjType='9;#0' ows_Created_x0020_Date='9;#2013-11-24 22:59:53'  ows_ProgId='9;#' ows_FileLeafRef='9;#Pablo-Ferrero.aspx' ows_PermMask='0x7fffffffffffffff' ows_Modified='2013-11-26     01:33:01' ows_FileRef='9;#sites/Gaslines/NewsAndEvents/Pages/Pablo-Ferrero.aspx' ows_DocIcon='aspx'     ows_Editor='24;#Harshini P Hegde' />\r\n   
<z:row ows_Article_x0020_Tags='20;#Charity;#14;#cricket' ows__ModerationStatus='0' ows__Level='1'   ows_Last_x0020_Modified='10;#2013-11-26 01:30:11' ows_ID='10' ows_UniqueId='10;#{C8D042AE-466F-44E8-940B-   0C9A64130923}' ows_owshiddenversion='8' ows_FSObjType='10;#0' ows_Created_x0020_Date='10;#2013-11-24 23:01:50'  ows_ProgId='10;#' ows_FileLeafRef='10;#Debra-L-Reed.aspx' ows_PermMask='0x7fffffffffffffff' ows_Modified='2013-11-  26 01:3:10' ows_FileRef='10;#sites/Gaslines/NewsAndEvents/Pages/Debra-L-Reed.aspx' ows_DocIcon='aspx'   ows_Editor='24;#Harshini P Hegde' />\r\n   
<z:row ows_Article_x0020_Tags='' ows__ModerationStatus='3' ows__Level='255' ows_Last_x0020_Modified='13;#2013-11-26     01:45:12' ows_ID='13' ows_UniqueId='13;#{81236BD1-AF3B-4D97-BA14-5492F8013251}' ows_owshiddenversion='5'    ows_FSObjType='13;#0' ows_Created_x0020_Date='13;#2013-11-26 01:28:45' ows_ProgId='13;#'    ows_FileLeafRef='13;#TestTagCloudPage.aspx' ows_PermMask='0x7fffffffffffffff' ows_Modified='2013-11-26 01:45:13'    ows_CheckoutUser='24;#Harshini P Hegde' ows_FileRef='13;#sites/Gaslines/NewsAndEvents/Pages/TestTagCloudPage.aspx'  ows_DocIcon='aspx' ows_Editor='24;#Harshini P Hegde' />\r\n</rs:data>\r\n</xml>"

the above xml also has the following before stringxml

<xml xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'\r\n     xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'\r\n     xmlns:rs='urn:schemas-microsoft-com:rowset'\r\n     xmlns:z='#RowsetSchema'>\r\n
<s:Schema id='RowsetSchema'>\r\n   
<s:ElementType name='row' content='eltOnly' rs:CommandTimeout='30'>\r\n      
<s:AttributeType name='ows_Article_x0020_Tags' rs:name='Article Tags' rs:number='1'>\r\n         

I need to get the output as

  string result= 14;#cricket;#21;#Headlines;#19;#Videos;20;#Charity;#14;#cricket

i.e I need the txt lying between

`<z:row ows_Article_x0020_Tags=" and " ows__ModerationStatus=`

I tried using linq . I am not able to to do it. So i want to do it using regex. Is it possible to delete everything else in the string except the result using regex.?

5
  • 3
    I think regex is not best way for parsing xml. Do you have all xml data, or these three lines only? You are missing some opening tags and namespace definitions Commented Nov 27, 2013 at 7:54
  • This is not the whole xml .I will add them now. Commented Nov 27, 2013 at 9:18
  • This is still invalid xml and it even don't have rows tags Commented Nov 27, 2013 at 9:23
  • The whole xml is too huge to post here. Is it ok if I add it here? Commented Nov 27, 2013 at 9:26
  • I've updated my answer with your xml namespace, so now both options will work - either with HtmlAgilityPack or with LINQ to XML (I recommend this option) Commented Nov 27, 2013 at 9:27

3 Answers 3

2

Thus you don't have valid xml here, you can treat this string as html and parse it with HTMLAgilityPack (available from NuGet):

HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(xmlstring);
var tags = hdoc.DocumentNode.Descendants()
               .Select(r => r.GetAttributeValue("ows_Article_x0020_Tags", ""));

string result = String.Join("", tags);
// 14;#cricket;#21;#Headlines;#19;#Videos20;#Charity;#14;#cricket

With valid xml recommended tool for parsing is LINQ to XML. And parsing should look like:

XDocument xdoc = XDocument.Parse(validXmlString);
XNamespace z = "#RowsetSchema";
var tags = xdoc.Descendants(z + "row")
               .Select(r => (string)r.Attribute("ows_Article_x0020_Tags"));
Sign up to request clarification or add additional context in comments.

4 Comments

the tags are null.. The entie xml output is available here..pastebin.com/9K8GRZg0
Try running a .Replace() to remove the literal \r\n characters (validXMLString.Replace("\\r\\n", ""))
@Jinxed works just fine with your xml, if you wll remove \r\n string from it (I believe you have it due to copy-paste from debugger). Also you have ItemCount=\"8\" escaped quotes
I really donno what happened. But it worked. Thanks a ton.. :)Thank uou so so much..
1

I can't stress how bad an idea it is to extract values from xml using regex, but if you really want to this should work:

        Regex regex = new Regex("ows_Article_x0020_Tags='([^']*)'");
        var matches = regex.Matches(xmlstring);
        Console.WriteLine(matches[0].Groups[1].Value);
        Console.WriteLine(matches[1].Groups[1].Value);

1 Comment

Thank you for this piece of code. Is it possible to remove everything else from the string by keeping just the values in order to avoid looping.??
0

I generally use LINQ to fetch values from XML, it makes it so much easier.

Example 1: LINQ to read XML

Example 2 : I use below to get a list of Question and Answers for a Quiz App

    public List<QuizQuestions> GetQuiz(int level)
    {
        string docName = "DataModel/Level" + level.ToString() + ".xml";
        XDocument xdoc = XDocument.Load(docName); 
        List<QuizQuestions> book = (from list in xdoc.Descendants("Question")
                                    select new QuizQuestions(list.Element("Quest").Value
                                                             , list.Element("A").Value
                                                             , list.Element("B").Value
                                                             , list.Element("C").Value
                                                             , list.Element("D").Value
                                                             , list.Element("Answer").Value)
                                                             ).OrderBy(a => Guid.NewGuid()).ToList();
        return book;
    }

UPDATE : This will work only with a valid XML

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.