1

I need to delete all nodes with a specific attribute value. Using XDocument.Descendants.Where clause with an inline function it's a snap to delete nodes based on a specific attribute value.

That works great.

I also need to delete all nodes that have specific descendant value. It made sense to me to use the same method for checking a descendant node of each element, and when that descendant value (innertext) matches a given value to look for, delete the element.

It almost works.

My code:

Dim Xmlstring As String =
    "<?xml version=" & Chr(34) & "1.0" & Chr(34) & " encoding=" & Chr(34) & "UTF-8" & Chr(34) & "?>" & vbNewLine &
    "<domain:DOGs " & vbNewLine &
    "      page=" & Chr(34) & "1" & Chr(34) & vbNewLine &
    "      ofPages=" & Chr(34) & "1" & Chr(34) & vbNewLine &
    "      xmlns:domain=" & Chr(34) & "urn:cat:np.domain.2111-14-42" & Chr(34) & vbNewLine &
    "      xmlns:vs=" & Chr(34) & "urn:cat:vs.2111-15-31" & Chr(34) & vbNewLine &
    "      xmlns:base=" & Chr(34) & "urn:cat:base.2111-15-31" & Chr(34) & ">" & vbNewLine &
    "   <domain:DOG name=" & Chr(34) & "Fido" & Chr(34) & ">" & vbNewLine &
    "      <description>Good</description>" & vbNewLine &
    "   </domain:DOG>" & vbNewLine &
    "   <domain:DOG name=" & Chr(34) & "Abby" & Chr(34) & ">" & vbNewLine &
    "      <description>Bad</description>" & vbNewLine &
    "   </domain:DOG>" & vbNewLine &
    "   <domain:DOG name=" & Chr(34) & Chr(34) & ">" & vbNewLine &
    "      <description>Ugly</description>" & vbNewLine &
    "   </domain:DOG>" & vbNewLine &
    "   <domain:DOG name=" & Chr(34) & "Bruno" & Chr(34) & ">" & vbNewLine &
    "      <description>Sweaty</description>" & vbNewLine &
    "   </domain:DOG>" & vbNewLine &
    "   <domain:DOG name=" & Chr(34) & "Shep" & Chr(34) & ">" & vbNewLine &
    "      <description>Good</description>" & vbNewLine &
    "   </domain:DOG>" & vbNewLine &
    "</domain:DOGs>"

Dim Xdoc As XDocument = XDocument.Parse(Xmlstring)
Dim iFoundCount As Integer = 0

'This works:
With Xdoc.Descendants().Descendants()
    iFoundCount = .Where(Function(e) e.Attributes("name").Any(Function(a) a = "")).Count
    .Where(Function(e) e.Attributes("name").Any(Function(a) a = "")).Remove()
End With
Dim sResultFile As String = "c:\0\1st_result_" & iFoundCount.ToString & ".xml"
Xdoc.Save(sResultFile)
'The result:
'<?xml version="1.0" encoding="utf-8"?>
'<domain:DOGs page="1" ofPages="1" xmlns:domain="urn:cat:np.domain.2111-14-42" xmlns:vs="urn:cat:vs.2111-15-31" xmlns:base="urn:cat:base.2111-15-31">
'  <domain:DOG name="Fido">
'    <description>Good</description>
'  </domain:DOG>
'  <domain:DOG name="Abby">
'    <description>Bad</description>
'  </domain:DOG>
'  <domain:DOG name="Bruno">
'    <description>Sweaty</description>
'  </domain:DOG>
'  <domain:DOG name="Shep">
'    <description>Good</description>
'  </domain:DOG>
'</domain:DOGs>


'This looks like it would work as there aren't any errors in the code
'and it almost works.
Xdoc = Nothing
Xdoc = XDocument.Parse(My.Computer.FileSystem.ReadAllText(sResultFile))
iFoundCount = 0
With Xdoc.Descendants().Descendants()
    iFoundCount = .Where(Function(e) e.Descendants("description").Value.Any(Function(a) a = "Sweaty")).Count '<-- error occurs here
    .Where(Function(e) e.Descendants("description").Value.Any(Function(a) a = "Sweaty")).Remove()
End With

sResultFile = "c:\0\2nd_result_" & iFoundCount.ToString & ".xml"
Xdoc.Save(sResultFile)

I get a popup pointing to the 'iFoundCount =' line that reads

    ArgumentNullException was unhandled
    Value cannot be null.
    Parameter name: source

What am I missing?

What do I do?

1

2 Answers 2

2

I usually find it easier to use XPath queries instead of LINQ syntax, but maybe that's just me. It does need the use of a NamespaceManager, but that's easy to set up, as you can see below.

Please make sure to set Option Strict to On. It tells Visual Studio to help you to get all variable types lined up correctly: the part that you commented as "This works" does not actually work with Option Strict On, so I have added code which does work.

Here is what I ended up with:

Option Infer On
Option Strict On

Imports System.Xml
Imports System.Xml.XPath

Module Module1

    Sub Main()
        Dim xmlstring = "<?xml version=""1.0"" encoding=""utf-8""?>
<domain:DOGs page=""1"" ofPages=""1"" 
    xmlns:domain=""urn:cat:np.domain.2111-14-42"" 
    xmlns:vs=""urn:cat:vs.2111-15-31"" 
    xmlns:base=""urn:cat:base.2111-15-31""
>
  <domain:DOG name=""Fido"">
    <description>Good</description>
  </domain:DOG>
  <domain:DOG name=""Abby"">
    <description>Bad</description>
  </domain:DOG>
  <domain:DOG name="""">
    <description>Ugly</description>
  </domain:DOG>
  <domain:DOG name=""Bruno"">
    <description>Sweaty</description>
  </domain:DOG>
  <domain:DOG name=""Shep"">
    <description>Good</description>
  </domain:DOG>
</domain:DOGs>"

        Dim xdoc = XDocument.Parse(xmlstring)

        Dim nsm = New XmlNamespaceManager(New NameTable())
        nsm.AddNamespace("domain", "urn:cat:np.domain.2111-14-42")
        nsm.AddNamespace("vs", "urn:cat:vs.2111-15-31")
        nsm.AddNamespace("base", "urn:cat:base.2111-15-31")

        Dim nFound = 0

        ' Select for an empty-or-whitespace name:
        Dim namelessDogs = xdoc.XPathSelectElements("//domain:DOG[normalize-space(@name)='']", nsm)

        nFound = namelessDogs.Count
        namelessDogs.Remove()

        Dim resultFile = String.Format("C:\Temp\1st_result_{0}.xml", nFound)

        xdoc.Save(resultFile)

        ' ###############

        xdoc = XDocument.Load(resultFile)

        Dim excludeDescription = "Good"

        ' Select <domain:DOG> elements which contain a <description> element with a specified value:
        Dim excludedDogs = xdoc.XPathSelectElements(String.Format("//description[text() = '{0}']/parent::domain:DOG", excludeDescription), nsm)

        nFound = excludedDogs.Count
        excludedDogs.Remove()

        resultFile = String.Format("C:\Temp\2nd_result_{0}.xml", nFound)

        xdoc.Save(resultFile)

        Console.ReadLine()

    End Sub

End Module

I chose to test it with a value which appears more than once - sometimes you can get it working with a singular value and then discover it only works for one occurrence of a value.

If you do want to allow the name to be spaces, then you can remove the normalize-space function like this: Dim namelessDogs = xdoc.XPathSelectElements("//domain:DOG[@name='']", nsm).

Also, later versions of Visual Studio are available for free - there's no reason from that aspect to be stuck with VS2010.

Sign up to request clarification or add additional context in comments.

1 Comment

Option Strict On -- thanks for that, I always forget to do that. Demo fiddle of your solution here: dotnetfiddle.net/LRU0UN
2

Your basic problem here is as follows:

  1. You seem to be confusing the methods Elements() and Descendants():

    • Elements() returns a [filtered] enumeration of the immediate child elements of the specified input(s).

    • Descendants() returns a [filtered] recursive enumeration of elements contained somewhere under the specified input(s).

    Thus when you do Xdoc.Descendants().Descendants() you are actually enumerating all elements in or under the document, then for each element you are enumerating all elements in or under it -- which results in duplicate enumerations for nested nodes.

    I can't imagine any reason to do Xdoc.Descendants().Descendants(). What you probably wanted to do was to iterate over all immediate child elements of the root element -- i.e. the <domain:DOG> elements. You could do this via Xdoc.Root.Elements().

  2. Because of your duplicate nested enumerations, you are modifying the node hierarchy (inside the inner enumeration) while still iterating through it (via the outer enumeration), resulting in undefined behavior. Specifically, your innermost

    .Where(Function(e) e.Descendants("description").Value.Any(Function(a) a = "Sweaty")).Remove()
    

    modifies the document while the outermost enumeration is still in progress. Sometimes it will work by chance. More often not.

    For confirmation, see the doc remarks for XNode.Remove()

    In LINQ to XML programming, you should not manipulate or modify a set of nodes while you are querying for nodes in that set. In practical terms, this means that you should not iterate over a set of nodes and remove them. Instead, you should materialize them into a List<T> by using the ToList extension method. Then, you can iterate over the list to remove the nodes. For more information, see Mixed Declarative Code/Imperative Code Bugs (LINQ to XML).

Now, it's a bit annoying that Extensions.Remove(this IEnumerable<T?> source) where T : XNode) doesn't return the count of nodes removed, so let's make one ourselves:

Imports System.Runtime.CompilerServices

Module XObjectExtensions
    <Extension()>
    Function CountAndRemove(Of TNode As XNode)(ByVal nodes As IEnumerable(Of TNode)) As Integer
        Dim count As Integer = 0
        Dim list = nodes.ToList()

        For Each node In list
            If node IsNot Nothing Then
                node.Remove()
                count += 1
            End If
        Next

        Return count
    End Function
End Module

Having done that, your first operation can be rewritten as:

Dim iFoundCount = Xdoc.Root.Elements().Where(
    Function(e) e.Attributes("name").Any(Function(a) a.Value = "")
).CountAndRemove()

And your second as:

Dim iFoundCount = Xdoc.Root.Elements().Where(
    Function(e) e.Elements("description").Any(Function(d) d.Value = "Sweaty")
).CountAndRemove()

Demo .NET 4.8 fiddle here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.