3

I'm looking to do a data transformation from a flat list into a hierarchical structure. How can I accomplish this in a readable way but still acceptable in performance and are there any .NET libraries I can take advantage of. I think this is considered a "facet" in certain terminologies (in this case by Industry).

public class Company
{        
    public int CompanyId { get; set; }
    public string CompanyName { get; set; }
    public Industry Industry { get; set; }
}

public class Industry
{
    public int IndustryId { get; set; }
    public string IndustryName { get; set; }
    public int? ParentIndustryId { get; set; }
    public Industry ParentIndustry { get; set; }
    public ICollection<Industry> ChildIndustries { get; set; }
}

Now let's say I have a List<Company> and I'm looking to transform it into a List<IndustryNode>

//Hierarchical data structure
public class IndustryNode
{
    public string IndustryName{ get; set; }
    public double Hits { get; set; }
    public IndustryNode[] ChildIndustryNodes{ get; set; }
}

So that the resulting object should look like this following after it is serialized:

{
    IndustryName: "Industry",
    ChildIndustryNodes: [
        {
            IndustryName: "Energy",
            ChildIndustryNodes: [
                {
                    IndustryName: "Energy Equipment & Services",
                    ChildIndustryNodes: [
                        { IndustryName: "Oil & Gas Drilling", Hits: 8 },
                        { IndustryName: "Oil & Gas Equipment & Services", Hits: 4 }
                    ]
                },
                {
                    IndustryName: "Oil & Gas",
                    ChildIndustryNodes: [
                        { IndustryName: "Integrated Oil & Gas", Hits: 13 },
                        { IndustryName: "Oil & Gas Exploration & Production", Hits: 5 },
                        { IndustryName: "Oil & Gas Refining & Marketing & Transporation", Hits: 22 }
                    ]
                }
            ]
        },
        {
            IndustryName: "Materials",
            ChildIndustryNodes: [
                {
                    IndustryName: "Chemicals",
                    ChildIndustryNodes: [
                        { IndustryName: "Commodity Chemicals", Hits: 24 },
                        { IndustryName: "Diversified Chemicals", Hits: 66 },
                        { IndustryName: "Fertilizers & Agricultural Chemicals", Hits: 22 },
                        { IndustryName: "Industrial Gases", Hits: 11 },
                        { IndustryName: "Specialty Chemicals", Hits: 43 }
                    ]
                }
            ]
        }
    ]
}

Where "Hits" are the number of companies that fall into that group.

To clarify, I need to transform a List<Company> into a List<IndustryNode> NOT serialize a List<IndustryNode>

4
  • What do you mean by efficiency? The most readable and maintainable or the best performing? Commented Oct 15, 2013 at 15:29
  • Sorry I didn't make it clear. It needs to be effecient but I'm willing to make some trade off for readability and maintainability Commented Oct 15, 2013 at 15:30
  • eventually it will be serialized Commented Oct 15, 2013 at 15:31
  • Why do you need performance? Your list seems pretty small. Commented Oct 15, 2013 at 15:31

4 Answers 4

1

Try this:

    private static IEnumerable<Industry> GetAllIndustries(Industry ind)
    {
        yield return ind;
        foreach (var item in ind.ChildIndustries)
        {
            foreach (var inner in GetAllIndustries(item))
            {
                yield return inner;
            }
        }
    }

    private static IndustryNode[] GetChildIndustries(Industry i)
    {
        return i.ChildIndustries.Select(ii => new IndustryNode()
        {
            IndustryName = ii.IndustryName,
            Hits = counts[ii],
            ChildIndustryNodes = GetChildIndustries(ii)
        }).ToArray();
    }


    private static Dictionary<Industry, int> counts;
    static void Main(string[] args)
    {
        List<Company> companies = new List<Company>();
        //...
        var allIndustries = companies.SelectMany(c => GetAllIndustries(c.Industry)).ToList();
        HashSet<Industry> distinctInd = new HashSet<Industry>(allIndustries);
        counts = distinctInd.ToDictionary(e => e, e => allIndustries.Count(i => i == e));
        var listTop = distinctInd.Where(i => i.ParentIndustry == null)
                        .Select(i =>  new IndustryNode()
                                {
                                    ChildIndustryNodes = GetChildIndustries(i),
                                    Hits = counts[i],
                                    IndustryName = i.IndustryName
                                }
                        );
    }

untested

Sign up to request clarification or add additional context in comments.

2 Comments

distrinctInd.Where(i => i.ParentIndustry == null) doesnt match any elements because the companies never reference any top level Industry elements. I've been trying to make it work otherwise but still am having much difficulty.
Try distinctInd.Where(i => i.ChildIndustries == null || i.ChildIndustries.Count == 0)
0

You are looking for a serializer. MSFT has one that is native to VS, but I like Newtonsofts, which is free. MSFT documentation and examples are here, Newtonsoft documentation is here.

Newtonsoft is free, easy and faster.

3 Comments

I really don't like someone giving me a minus one with no reason. If you don't have a reason, don't vote it down.
I didn't downvote but the answer is not helpful. I'll already be using JSON.NET to serialize but I still need to get it into the proper structure.
That wasn't clear in the original post (as evidenced in half the answers). It sounded like you were looking for performance. Sorry I misunderstood your question. I still think it's lousy to minus one anything and not explain your reason.
0

Try to use json serializer for this purpose. I see that you data structure is OK, this is just a matter of serialization.

var industryNodeInstance = LoadIndustryNodeInstance();

var json = new JavaScriptSerializer().Serialize(industryNodeInstance);

If you want to choose between serializers please see this: http://www.servicestack.net/benchmarks/#burningmonk-benchmarks

LoadIndustryNodeInstance method

  • Build List<Industry>

  • Convert IndustryTree = List<IndustryNode>

  • Implement Tree methods, such Traverse. Try to look at Tree data structure in C#

1 Comment

The question concerns what happens in LoadIndustryNodeInstance(). I have a List<Company> not a List<IndustryNode>
0

Here is some psuedo code that might get you along the way. I create a map/dictionary index and populate it with the company list. Then we extract the top level nodes from the index. Note that there may be edge cases (For example, this index may need to be partially filled initially as it doesn't seem any of your companies ever reference the very top level nodes, so those will have to be filled in some other way).

Dictionary<String, IndustryNode> index = new Dictionary<String, IndustryNode>();

public void insert(Company company)
{ 
    if(index.ContainsKey(company.Industry.IndustryName))
    {
        index[company.Industry.IndustryName].hits++;
    }
    else
    {
        IndustryNode node = new IndustryNode(IndustryName=company.Industry, Hits=1);
        index[node.IndustryName] = node;
        if(index.ContainsKey(company.Industry.ParentIndustry.IndustryName))
        {
            index[company.Industry.ParentIndustry.IndustryName].ChildrenIndustries.Add(node);
        }
    }    
}

List<IndustryNode> topLevelNodes = index
    .Where(kvp => kvp.Item.ParentIndustry == null)
    .ToList(kvp => kvp.Item);

4 Comments

This solution will not take into account children of children of an industry if it is not affected to a company.
@AhmedKRAIEM True, those would have to be inserted initially.
Thanks for the answer, if this method took an Industry instead where could recursion be applied to handle the children of children case?
Can you explain further? The way the data is currently presented, recursion isn't immediately usable. Eg. You could search a tree via recursion but that won't be different from a linear search since there's no stated search order.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.