0

I am writing an XML comparison tool and need to ignore the object, rather than a property, order. For instance:

XML 1.

<Root>
     <Vehicles>
          <Vehicle>
               <Registration>ABC123</Registration>
               <Make>Ford</Make>
               <Model>Focus</Model>
               <Doors>4</Doors>
          </Vehicle>
          <Vehicle>
               <Registration>DEF789</Registration>
               <Make>BMW</Make>
               <Model>330E</Model>
               <Doors>4</Doors>
          </Vehicle>
     </Vehicles>
</Root>

XML 2.

<Root>
     <Vehicles>
          <Vehicle>
               <Registration>DEF789</Registration>
               <Make>BMW</Make>
               <Model>330E</Model>
               <Doors>4</Doors>
          </Vehicle>
          <Vehicle>
               <Registration>ABC123</Registration>
               <Make>Ford</Make>
               <Model>Focus</Model>
               <Doors>4</Doors>
          </Vehicle>
     </Vehicles>
</Root>

The order of the Vehicle objects is not important, so I want the above 2 XMLs to be deemed as matching ones.

However XmlUnit Diff returns mismatches for example it is comparing vehicle 0 on each side so diffs for reg ABC123 vs DEF789, make BMW vs Ford etc.

Found plenty of examples explaining how to ignore property order, but not how to match up objects before comparing properties. With some playing around I can match objects by a field for example by pairing up based on a Registration either side, however this one requires a method per a list of objects for example Registration for the above list of Vehicle, Forename for a driver collection and so on.

Hoping, there is a generic approach as these XMLs being compared relatively to insurance quote hub requests from aggregators and as such contain hundreds of collections which may be in different orders, for example collections of vehicles, drivers, claims, convictions, medical conditions, numerous enrichment score collections for credit scoring, fraud scoring, checking birth data and so on.

Any ideas?

I tried overriding DefaultNodeMatch without much luck.

4
  • One way to make that work with minimal effort is to run them both through an XSLT that sorts those repeated items by some common value (i.e. sort Vehicle by it's Registration value, or even just the computed string value of the element) and THEN diff those normalized docs. Commented Aug 22 at 13:26
  • Why are you ignoring order though? Order can be very important in many XML files. Commented Aug 22 at 13:52
  • When asking questions, it's useful to use the correct terminology. They are elements and attributes, not objects and properties. Commented Aug 22 at 14:26
  • I'd "instantiate" both; re-order the vehicles if need be; then use reflection to compare. Commented Aug 22 at 15:46

2 Answers 2

1

The simplest approach would be to use XSLT to sort both files so the vehicles are in registration order, and then compare the results after sorting.

The deep-equal() function in XPath 4.0 has options to do what you want, but it's not yet a mature spec.

Sign up to request clarification or add additional context in comments.

Comments

1

As mentioned you could use XmlUnit library.

Below is the full code that achieves your goal:

using Org.XmlUnit.Builder;
using Org.XmlUnit.Diff;

var xml1 = @"<Root>
    <Vehicles>
         <Vehicle>
              <Registration>ABC123</Registration>
              <Make>Ford</Make>
              <Model>Focus</Model>
              <Doors>4</Doors>
         </Vehicle>
         <Vehicle>
              <Registration>DEF789</Registration>
              <Make>BMW</Make>
              <Model>330E</Model>
              <Doors>4</Doors>
         </Vehicle>
    </Vehicles>
</Root>";

var xml2 = @"<Root>
    <Vehicles>
         <Vehicle>
              <Registration>DEF789</Registration>
              <Make>BMW</Make>
              <Model>330E</Model>
              <Doors>4</Doors>
         </Vehicle>
         <Vehicle>
              <Registration>ABC123</Registration>
              <Make>Ford</Make>
              <Model>Focus</Model>
              <Doors>4</Doors>
         </Vehicle>
    </Vehicles>
</Root>";

Diff diff = DiffBuilder.Compare(xml1)
    .WithTest(xml2)
    .WithNodeMatcher(new DefaultNodeMatcher(ElementSelectors
        .ConditionalBuilder()
        .WhenElementIsNamed("Vehicle")
        .ThenUse(ElementSelectors.ByXPath("./Registration", ElementSelectors.ByNameAndText))
        .ElseUse(ElementSelectors.ByName)
        .Build()))
    .IgnoreWhitespace()
    .CheckForSimilar()
    .Build();

if (!diff.HasDifferences())
    Console.WriteLine("XMLs match (ignoring Vehicle order).");
else
    Console.WriteLine("Differences found:\n" + diff.ToString());

What is happening here:

ElementSelectors.ByXPath("./Registration", ElementSelectors.ByNameAndText)

matches vehicles by their Registration value, making it work regardless of Vehicles order.

The rest (Make, Model, etc.) are compared as normal.

C# fiddle

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.