I have a set of data carrier objects represented using vertices in a strict tree hierarchy. I also have the following requirements:
- some vertices should be "linked" to other vertices directly
(think undirected edge) - the "links" themselves should be completely separate from the tree hierarchy
- "links" can be either 1-1 or 1-many
- each link has a "source" and a "target". 1-1 relationships have a single target, while 1-many have a list of targets.
- a "source" vertex must be able to reach the linked "target" vertex (by querying the link), and vice versa
- the setup needs to work in a multithreaded environment where different threads can work on different parts of the tree
- threads can freely add/modify/delete these "links". The 1-1 links only permit changing of targets, but the 1-many links allow addition/removal of specific targets. I've been assured that such ops will be atomic so I'm not worried about race conditions.
- I am required to ensure that link addition/modification/removals are resolved (e.g. if a "target" in a 1-many link is removed, the affected "target" should not have a "link" back to the "source", but the other links should remain unaffected )
- the overall structure will undergo serialization/deserialization. As such, the tree hierarchy as well as "link" connectivity must either be maintained (via state) or inferrable (via graph traversal or some other means). I am only looking at the "link" part of the equation here.
- both the tree itself, and the number of "links" could become quite large (around 100,000 or more vertices)
Here's an image showing what an example might look like:

I thought up of three different ways to do this:
- have "links" be separate data carrier objects that hold "source" and "targets". This would require a separate graph structure of sorts for quick lookup
- "links" only store
targetreferences and get stored on the "source" node directly. Traversal from the "target" to "source" nodes is handled by a separate helper class (so reverse connectivity is inferred) - "links" only store
targetreferences and get stored on the "source" as well as the "target" nodes. So the source node stores a reference to a target node, and the target node stores a reference to the source node.
Approach 1 is a non-starter for several reasons I can't share here. That leaves me with approaches 2 and 3. I've drawn up an image below showing how they might look like:
I'm trying to suss out which of these approaches might be better overall. Some of my thoughts:
- Approach 3 is good for speed as connectivity is stored in state. But this approach also increases the amount of state in the system which isn't good given it's multithreaded.
- Approach 2 is better from a state pov (less state overall), but it shifts complexity to the helper method/class. The "reverse" link of approach 3 still needs to be inferred if not explicitly stored.
- In both approaches I need to resolve link addition/modification/removals. Since both approaches require me to infer the "inverse" relationship, it's basically the same level of complexity (just handled in different places).
My questions (finally) are:
- which of the two would be the better approach given my constraints ?
- what would be the issues (if any)
- is there a better approach than the 3 discussed here ?
Edit :
Re these points:
- threads can freely add/modify/delete these "links". The 1-1 links only permit changing of targets, but the 1-many links allow addition/removal of specific targets. I've been assured that such ops will be atomic so I'm not worried about race conditions.
- I am required to ensure that link addition/modification/removals are resolved (e.g. if a "target" in a 1-many link is removed, the affected "target" should not have a "link" back to the "source", but the other links should remain unaffected )
From what I've been told - before any mutative op (including "link" ops), each thread will bring relevant vertices into its scope and those parts of the tree that are within the thread's scope will be locked. So if a thread wishes to add a link between A and B, both of them would be brought into scope and locked before adding said link.
Also, I forgot to mention this - I'm allowed to traverse both the tree and the links for the purposes of link generation/mutation. Approach 2 did hint at this, but I should have made it explicit in the OP.
Finally, re the "helper" in approach 2 - My thinking was to have it internally use a threadsafe hashmap and maintain mappings between vertices and their backreferenced targets.
