0

I am working on a deserializer in C# for an XML file type for a program I don't have any control over. Unfortunately, the XML file structure completely breaks conventions in two major ways, as far as I can tell, and it's been complicating the patterns for me.

The file is used to define user interface components within a video game. There is a main XML element called "layout" that is always the root element, with two child elements - <hierarchy> and <components>.

<layout
  version="137"
  comment="Agh!">
  <hierarchy>
    <!-- see below -->
  </hierarchy>
  <components>
    <!-- see below -->
  </components>
</layout>

The hierarchy element contains the hierarchical structure of the UI tree defined by each layout file. It always starts with a single "root" node, and the root node can have a single child, but beneath that any node can have as many children as needed.

<hierarchy>
  <root this="1234">
    <main_child this="5678">
      <child_a/>
      <child_b/>
      <child_c>
        <grandchild_a/>
      </child_c>
    </main_child>
  </root>
</hierarchy>

And the components hierarchy has a similar strange structure, with every child of the "components" node being one of two very similar types, but the actual tag of each node can vary as above.

<components>
  <root this="1234" id="root">
    <!-- etc... -->
  </root>
  <main_child
    this="5678"
    id="main_child"
    state="active">
  </main_child>
  <!-- and so on, with one "primary" node for each UI component in this file -->
</components>

These make it difficult to quickly parse the child elements of both and , and I'm having trouble getting XmlSerializer to recognize any of the children for both of these sections. I can get "root" to be recognized by the hierarchy class easily, but getting that to work recursively hasn't worked yet, and I'm having difficulty getting the array beneath to work also.

Originally, I was developing a manual converter using XDocument, but that showed to be significantly too-much work since it required unique handling of every single attribute, of which there are hundreds in this file, which can also change between "versions" of it.

I've been testing these out through using the various Xml attributes available as hinters for the XmlSerializer.

public class LayoutModel {
  [XmlAttribute("version")]
  public uint Version { get; set; }

  // etc ...
  
  [XmlElement("hierarchy")]
  public HierarchyModel Hierarchy { get; set; }

  [XmlElement("components"), typeof(ComponentModel))]
  public ComponentModel[] Components { get; set; }
}

public class HierarchyModel {
  // This converts fine, but getting the understanding of the children is not working.
  [XmlElement("root")]
  public HierarchyNodeModel RootNode { get; set; }
}

public class HierarchyNodeModel {
  [XmlArrayItem(Type = typeof(TestHierarchyNodeModel))]
  [XmlArray]
  public TestHierarchyNodeModel[] ChildNodes { get; set; }

  [XmlAttribute("this")]
  public string GUID { get; set; }
}

public class ComponentModel {
  [XmlAttribute("this")]
  public string GUID { get; set; }

  [XmlAttribute("id")]
  public string Id { get; set; }
}

With using the above, the LayoutModel deciphers fine, and I get the Hierarchy -> Root connection, but "ChildNodes" in the Root is null, so nothing beneath it is deserialized. Likewise, the "Components" array is empty, at size 0.

Simple example XML structure for this problem at hand:

<?xml version="1.0"?>
<layout
    version="137"
    comment=""
    precache_condition="">
    <hierarchy>
        <root this="2A19D461-6F9E-45F7-977F41D42D07FDB0">
            <template_row_header this="F3416CFD-8BC7-4276-86996FD67D7F6A75">
                <dy_title this="77E6B934-1E3B-40E5-BC14C7F643672167"/>
            </template_row_header>
        </root>
    </hierarchy>
    <components>
        <root
            this="2A19D461-6F9E-45F7-977F41D42D07FDB0"
            id="root">
        </root>
        <template_row_header
            this="F3416CFD-8BC7-4276-86996FD67D7F6A75"
            id="template_row_header"
            offset="0.00,4.00">
        </template_row_header>
        <dy_title
            this="77E6B934-1E3B-40E5-BC14C7F643672167"
            id="dy_title"
            offset="0.00,0.00">
        </dy_title>
    </components>
</layout>
2
  • Do the elements that appear under <hierarchy> have the same schema when they appear under <components>? E.g. is <template_row_header> appears under both but has more attributes under <components>. Are those attributes optional? Could a <template_row_header> element have those same attributes no matter where it appears? Commented Sep 11, 2024 at 16:37
  • Also, were you provided an XSD for the XML you need to deserialize? Commented Sep 11, 2024 at 19:51

3 Answers 3

0

You could try use NewtonSoft.Json to parse xml to jsonString, then deserilize the jsonString object.

var doc = XDocument.Load("test.xml");
var jsonString = JsonConvert.SerializeXNode(doc);

//parse to jObject
var jObject = JsonConvert.DeserializeObject<JObject>(jsonString);
var value = jObject["layout"]["hierarchy"]["root"]["template_row_header"]["dy_title"]["@this"];

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

This approach is very cumbersome if you have dynamic XML. As class definitions have to be done at compile time, you don't get much flexibility at runtime.

But there's another class for handling XMLs - XDocument.

It does not require predefined class to deserialize into and allows more dynamic approach.

Here's code snippet that traverses whole XML:

using System.Xml.Linq;

var rawJson = @"<?xml version=""1.0""?>
<layout
    version=""137""
    comment=""""
    precache_condition="""">
    <hierarchy>
        <root this=""2A19D461-6F9E-45F7-977F41D42D07FDB0"">
            <template_row_header this=""F3416CFD-8BC7-4276-86996FD67D7F6A75"">
                <dy_title this=""77E6B934-1E3B-40E5-BC14C7F643672167""/>
            </template_row_header>
        </root>
    </hierarchy>
    <components>
        <root
            this=""2A19D461-6F9E-45F7-977F41D42D07FDB0""
            id=""root"">
        </root>
        <template_row_header
            this=""F3416CFD-8BC7-4276-86996FD67D7F6A75""
            id=""template_row_header""
            offset=""0.00,4.00"">
        </template_row_header>
        <dy_title
            this=""77E6B934-1E3B-40E5-BC14C7F643672167""
            id=""dy_title""
            offset=""0.00,0.00"">
        </dy_title>
    </components>
</layout>";

var xDoc = XDocument.Parse(rawJson);

TraverseElement(xDoc.Root);

void TraverseElement(XElement node, string currentPath = "")
{
    if(!node.HasElements)
        Console.WriteLine($"Value for path {currentPath} is {node.Value}, first attribute value: {node.FirstAttribute?.Value}");

    foreach (var item in node.Descendants())
    {
        TraverseElement(item, currentPath + " -> " + item.Name);
    }
}

And the output is:

Value for path  -> hierarchy -> root -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> hierarchy -> root -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> hierarchy -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> hierarchy -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> root -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> root -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> components -> root is , first attribute value: 2A19D461-6F9E-45F7-977F41D42D07FDB0
Value for path  -> components -> template_row_header is , first attribute value: F3416CFD-8BC7-4276-86996FD67D7F6A75
Value for path  -> components -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path  -> root is , first attribute value: 2A19D461-6F9E-45F7-977F41D42D07FDB0
Value for path  -> template_row_header is , first attribute value: F3416CFD-8BC7-4276-86996FD67D7F6A75
Value for path  -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167

UPDATE Below is more examples how to get information out of XML using XDocument class. Comments in code:

var xDoc = XDocument.Parse(rawJson);

var componentChildren = xDoc.Descendants("components").Descendants();

foreach (var child in componentChildren)
{
    var tagName = child.Name;

    Console.WriteLine($"tagName={tagName} which has following attributes:");

    PrintAllAttributes(child);

    PrintSpecificAttribute(child, "this");
}

// This is the way to access all attributes of element.
void PrintAllAttributes(XElement child)
{
    var attributes = child.Attributes();

    foreach (var attribute in attributes)
        Console.WriteLine(attribute);
}
// And here's how you can access specific attribute and get its value.
void PrintSpecificAttribute(XElement child, string attributeName)
{
    var attribute = child.Attribute(attributeName);
    Console.WriteLine(attribute.Value);
}

1 Comment

Hey Michael, thanks for the insight. I had originally tried using XDocument, but the overhead of the operation started to really complicate the code. Do you have any suggestions for how I could utilize XDocument for these irregular sections, while still being able to use XmlSerializer for the rest of the standard POCO mapping?
0

In a situation where you have a sequence of child elements that have different element names but similar schemas, you can use XmlSerializer's support for polymorphism to map the elements to a collection of C# polymorphic types. This corresponds to a sequence of <xsd:choice> elements in an XSD schema. The basic idea is to:

  1. Define some base type TBaseType with the properties common to all elements. (It could be object if there are no common properties.)

  2. Add derived types TDerivedType corresponding to each specific element name derivedElementName, and apply [XmlType("derivedElementName")] to each.

  3. In each containing type that has a sequence of TBaseType elements, add a List<TBaseType> property. Then inform the serializer of all possible derived types by applying [XmlElement(typeof(TDerivedType))] for all derived types.

Thus, for your specific XML, first define the following data model:

// The root model <layout>
[XmlType("layout"), XmlRoot("layout")]
public class LayoutModel 
{
    [XmlAttribute("version")]
    public uint Version { get; set; }

    [XmlAttribute("comment")]
    public string Comment { get; set; } = "";

    [XmlAttribute("precache_condition")]
    public string PrecacheCondition { get; set; } = "";

    [XmlElement("hierarchy")]
    public LayoutContainerModel? Hierarchy { get; set; }

    [XmlElement("components")]
    public LayoutContainerModel? Components { get; set; }
}

// THe container model for <hierarchy> and <components>.
public class LayoutContainerModel
{
    [XmlElement("root")]
    public LayoutItemRoot? Root { get; set; }

    // The same list of polymorphic children must appear in LayoutItemBase.Children
    [XmlElement(typeof(DyTitle)),
     XmlElement(typeof(TemplateRowHeader))
     // Add others as required
    ]
    public List<LayoutItemBase> Children { get; set; } = new();
}

// The type hierarchy for the polymorphic sequence of child elements
public abstract class LayoutItemBase
{
    [XmlAttribute("this")]
    public string? This { get; set; }
    
    [XmlAttribute("id")]
    public string? Id { get; set; }
    
    // The same list of polymorphic children must appear in LayoutModelContainer.Children
    [XmlElement(typeof(DyTitle)),
     XmlElement(typeof(TemplateRowHeader))
     // Add others as required
    ]
    public List<LayoutItemBase> Children { get; set; } = new();
}

[XmlType("root")]
public class LayoutItemRoot : LayoutItemBase;

[XmlType("dy_title")]
public class DyTitle : LayoutItemBase
{
    [XmlAttribute("offset")]
    public string? Offset { get; set; }
}

[XmlType("template_row_header")]
public class TemplateRowHeader : LayoutItemBase
{
    [XmlAttribute("offset")]
    public string? Offset { get; set; }
}

Then given some static extension method like:

public static T? LoadFromFile<T>(string path, XmlSerializer? serial = null)
{
    using var stream = File.OpenRead(path);
    return (T?)(serial ?? new XmlSerializer(typeof(T))).Deserialize(stream);
}

You will be able to load your LayoutModel from a file path as follows:

var model = XmlExtensions.LoadFromFile<LayoutModel>(path);

Notes:

  • If the polymorphic element names differ from collection to collection, you can apply [XmlElement(string? derivedElementName, Type? type)] to the collection property, instead of [XmlEoot(derivedElementName)] to the derived type, to specify both the polymorphic names and types on a per-collection basis.

  • I chose to use the same C# type for identical elements. Thus your HierarchyNodeModel and ComponentNodeModel were replaced with a single LayoutContainerModel.

  • If you were provided an XSD for your XML file, then xsd.exe will construct a polymorphic type hierarchy for you as long as the XSD contains <xsd:choice> elements. xsd.exe will not, however, infer a polymorphic type hierarchy from some sample XML file (and will sometimes infer an incorrect type hierarchy when presented with polymorphic XML.)

  • For more on XmlSerializer polymorphism, see

Demo fiddle here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.