Java Map vs Array (Temp) Sort of Last Index

Question

I have looked at similar questions that detail the sorting of Maps and sorting of arrays of primitive data types, but no question directly details the difference between a one-time sort of a Java Map vs primitive data type array ([]).

Primary note* I know that 'TreeMap' is the sorted version (by key) of Map in Java, but I don't know how much about the 'behind-the-scenes' of how TreeMap sorts the keys (either while data is being added, or after the data is FINISHED being added)?

Primary note 2* Dijkstra's algorithm in this case is not an EXACT implementation. We are just finding the shortest path of weighted edges in a graph G of size M nodes. This means that adjacency matrix (format seen below) is of size M x M. This is not a SMART implementation. Pretty much just as base-line as you can get... sorry for the confusion!

We are given an adjacency matrix, where elements are related to each other ('connected') in the following example:

0,1,5   // 0 is connected to 1, and the weight of the edge is 5
0,2,7   // 0 is connected to 2, and the weight of the edge is 7
0,3,8   // 0 is connected to 3, and the weight of the edge is 8
1,2,10  // 1 is connected to 2, and the weight of the edge is 10
1,3,7   // 1 is connected to 3, and the weight of the edge is 7
2,3,3   // 2 is connected to 3, and the weight of the edge is 3

But never mind the input, just assume that we have a matrix of values to manipulate.

We are looking at storing all the possible paths in a "shortest path" algorithm (I'm sure 75% or more of people on SO know Dijkstra's algorithm). This IS for homework, but an implementation question, not a "solve this for me" question.

ASSUME that the size of the matrix is very large (size M x M), maybe more than 50x50 in size. This would result in [50-1]!/2 = 1.52 × 10^64 results in the result list assuming that our algorithm was smart enough to pick out duplicates and not find the length of a duplicate path (which it is not, because we are noobs at Graph Theory and Java, so please don't suggest any algorithm to avoid duplicates...).

My friend says that a temp sort (using a temporary variable) on an index of int[n] in a List, where int[n] is the last index and value of the shortest path (ALGORITHM_1) may be faster than TreeMap (ALGORITHM_2) where the key of the Map is the value of the shortest path.

We were debating as to what implementation would be faster in trying to find ALL lengths of the shortest path. We can store it as the last index of each path (have an int[] where the last element is the value (sum) of the shortest path (all elements int the array) (ALGORITHM_1), OR we can store that sum as the KEY of the Map (ALGORITHM_2).

Because this is a shortest path algorithm (albeit not a great one...), we NEED to sort the results of each path by length, which is the sum of each edge in the graph, in order to find the shortest path.

So the real question is: what would be faster in sorting the results ONLY ONE TIME? Through a Map.sort() algorithm (built into the Java standard library) or through creating a temporary variable to hold the value of the most recent 'length' in each int[]? For example:

myMap.sort(); // Unless TreeMap in Java does 'behind=the-scenes' sorting on keys...
myMap.get(0); // This would return the first element of the map, which is the shortest path

OR

int temp = myList.get(0)[m]; // Store a temp variable that is the 'shortest path'
for( int[] i in myList<int[]>) {
    if (temp > myList.get(i)[m]) { // Check if the current path is shorter than the previous
        temp = myList.get(i)[m]; // Replace temp if current path is shorter
    }
}

Note that I haven't actually tested the implementations yet, nor have I checked my own Java syntax, so I don't know if these statements are declared correctly. This is just a theoretical question. Which would run faster? This is my 3rd year of Java and I don't know the underlying data structures used in HashMap, nor the Big O notation of either implementation.

Perhaps someone who knows the Java standard could describe what kind of data structures or implementations are used in HashMap vs (Primitive data type)[], and what the differences in run times might be in a ONE-TIME-ONLY sort of the structures.

I hope that this inquiry makes sense, and I thank anyone who takes the time to answer my question; I always appreciate the time and effort generous people such as yourselves put into helping to educate the newbies!

Regards, Chris

That was a lot of questions. But I can say that TreeMap puts the elements where they belong when they are added and maintains itself in such a state as to make an iterator over its elements return the elements in order. That is to say it would have more overhead to maintain that tree if you have a set of data that you know up front and to which you will not add any data (or do so only rarely). Also, Map.get(0) returns the element associated with the key "0" which is not the same as the zeroth element. — scott_fakename
– scott_fakename, Commented Apr 10, 2013 at 3:47
Doesn't dijkstra find a shortest path in unvisited nodes? During the relax, the next shortest edge could be found. So, why sort again? — Chengxi Li
– Chengxi Li, Commented Apr 10, 2013 at 3:49
@ scott_fakename and @iamsleepy ... sorry for the confusion. I added an edit 'Primary note 2*' that differentiates the difference between a REAL Dijkstra's algorithm and our REQUIRED implementation... the idea is the same, but the implementation is not. We are not designing a smart algorithm that does exactly what Dijkstra's algorithm does, because we are not Graph Theorists. Just Discrete Mathematics students trying to learn about Graph Theory through coding of small applications that can determine simple attributes of graphs :) — Chris Cirefice
– Chris Cirefice, Commented Apr 10, 2013 at 3:55
I also updated the 'code' to reflect that the myMap.get(0) is actually just to get the length of the very first path, which is used int he TEMP VARIABLE algorithm (ALGORITHM 1). This is, of course, not related to the shortest path directly, but rather just getting the first length to compare all other lengths to. EDIT: I'm sorry... I initially thought that myMap.get(0) was used int the TEMP VARIABLE algorithm, which it is not. I just assumed that TreeMap was auto-sorted as elements were added, I do not know the exact implementation; I cannot exact the results of my comments... I'm sorry =/ — Chris Cirefice
– Chris Cirefice, Commented Apr 10, 2013 at 3:57

Zim-Zam O'Pootertoot · Accepted Answer · 2013-04-10 04:02:10Z

2

It may not be necessary to sort your data in order to find the shortest path. Instead, you could iterate through the data and keep track of the shortest path that you've encountered.

Assuming the data is stored in an array of Data objects, with data.pathLength giving the path length,

Data[] data; // array of data
Data shortest = data[0]; // initialize shortest variable
for(int i = 1; i < data.length; i++) {
    if(data[i].pathLength < shortest.pathLength)
        shortest = data[i];
}

That said, TreeMap is a Red-Black Tree, which is a form of balanced binary tree. Unlike a standard binary tree, a balanced binary tree will rotate its branches in order to ensure that it is approximately balanced, which ensures log(n) lookups and insertions. A red-black tree ensures that the longest branch is no more than twice the length of the shortest branch; an AVL Tree is a balanced binary tree with even tighter restrictions. Long story short, a TreeMap will sort its data in n*log(n) time (log(n) for each insertion, times n data points). Your one-time array sort will also sort its data in n*log(n) time, assuming you're using Mergesort or Quicksort or Heapsort etc (as opposed to Bubblesort or another n^2 sort algorithm). You cannot do better than n*log(n) with a comparison sort; incidentally, you can use a transformation sort like Radix Sort that has a big-oh of O(n), but transformation sorts are usually memory hogs and exhibit poor cache behavior, so you're usually better off with one of the n*log(n) comparison sorts.

Since TreeMap and your custom sort are both n*log(n), this means that there's not much theoretical advantage to either one, so just use the one that's easier to implement. TreeMap's complex data structure does not come free, however, so your custom sorting algorithm will probably exhibit slightly better performance, e.g. maybe a factor of 2; this probably isn't worth the complexity of implementing a custom sort as opposed to using a TreeMap, especially for a one-shot sort, but that's your call. If you want to play around with boosting your program's performance, then implement a sorting algorithm that's amenable to parallelization (like Mergesort) and see how much of an improvement that'll get you when you split the sorting task up among multiple threads.

edited Apr 10, 2013 at 4:02

answered Apr 10, 2013 at 3:55

Zim-Zam O'Pootertoot

18.2k4 gold badges45 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Chris Cirefice Over a year ago

I learned about Red-Black Tress in CIS 263 last semester. I didn't fully understand how they functioned (by adjusting themselves based on what values are on the next 'level' to maintain balance)... I understand that this implementation occurs in TreeMaps, as it makes sense entirely. I also understand that my 'custom sort' based on the last value of an int[] are both O(NlogN). Running through some custom data, I recognize these Big O patterns. Your idea seems the same as my friends, where you simply keep track of the shortest path by looking at the previous shortest path. This seems efficient.

Zim-Zam O'Pootertoot Over a year ago

NOBODY fully understands how red-black trees work, except for the ten people in the world who implement them for language libraries.

Chris Cirefice Over a year ago

Continuation from my previous comment... from what you've said, it seems that the implementation of TreeMap (red-black tree) in Java is very similar to keeping track of the latest value in the shortest pat (which is the last index of the int[] in ALGORITHM_1). This is all arbitrary of course because it's just a simple homework assignment based on gathering ~10 nodes from an input file, but I just wanted to clarify the difference between TreeMap sorting and int[] sorting. I remember the AVL and red-black trees now, so both methods seem easily applicable to the context. Thank you very much :)

Chris Cirefice Over a year ago

I suppose that you're right Maybe my CIS 263 instructor understood them a little bit... but it was still really really difficult to understand the basics of how R-B Trees function. Hopefully in the future I will figure out how all of these abstract data structures function, but in reality I think it might be impossible to learn them all. After all, the human brain can store what... approximately 2 Petabytes of data? Obviously, these data structures take up more space than that. What kind of human brain could keep track of all that data... ;)

Steven Hood · Accepted Answer · 2013-04-10 04:01:54Z

0

If you want the shortest path, sorting isn't necessary. Just track the shortest path as you finish each path, and update your shortest path when you encounter a shorter path. That'll wind up giving you an O(n), where n is the number of paths. You couldn't realistically store 10^64 paths anyway, so some truncation of the result set will be required.

answered Apr 10, 2013 at 4:01

Steven Hood

6885 silver badges18 bronze badges

3 Comments

Chris Cirefice Over a year ago

@ Steven Hood : This is a similar solution to @Zim-Zam O'Pootertoot 's solution (which is essentially ALGORITHM_2). I understand that both solutions will be equally applicable to the application domain (based on his answer and description of the underlying data structures of TreeMap in Java), so your answer is also applicable to my question (but of a duplicate nature). I believe that keeping track of the temp variable as paths are found is more efficient and easier to implement, especially if one does not understand Maps very well (my friend, for example). Thank you for your input :)

Steven Hood Over a year ago

Right, I'm additionally suggesting that (assuming it isn't required) you don't track the resultant paths, since it isn't strictly necessary to find even the shortest N (where N << M*M) paths.

Chris Cirefice Over a year ago

If I understand you correctly, we don't need to keep track of 'duplicate' paths, which include reverse iterations of the initial paths that we track. For example... and I'm sorry for the unwanted formatting in comments... if we had the data set 0, 1, 2, 3, 4, 0m then 0, 4, 3, 2, 1, 0 would ALSO be a duplicate entry, that of course, has the exact same 'shortest path' value. This is why I said our algorithm would NOT be conforming to the 'hopeful' (n-1)!/2 number of results. It is sad, but I don't have enough time or experience to code something that checks for duplicates and skips... =/

prashantsunkari · Accepted Answer · 2013-04-10 04:11:12Z

0

 how much about the 'behind-the-scenes' of how TreeMap sorts the keys (either while data   
 is being added, or after the data is FINISHED being added)?

TreeMap uses RedBlack tree algorithm (a variant of BST) where operations containsKey, get, put and remove operations take O(log(n)) time which is the very good. The keys gets sorted after each add of element as the TreeMap definition (in the link) explains it. Sorting will take O(nlog(n))

I am not sure why you are comparing a Map type - which uses key, value pair against Array. You have mentioned about using length of shortest paths as keys in TreeMap but what is that you are putting as values? If you just want to store "length of paths", I would suggest put them in array and sort them using Arrays.Sort() which will also sort in O(nlog(n)) using a different algorithm Dual-Pivot Quicksort.

Hope this helps!

answered Apr 10, 2013 at 4:11

prashantsunkari

9597 silver badges21 bronze badges

2 Comments

Chris Cirefice Over a year ago

Sorry for no clarifying fully... I understand the TreeMap implementation, which makes sense. I remember it from last semester in CIS 263. However, the dispute was between TreeMap and the LAST value of an int[], where the last value was the length of the shortest path. Compare this to the Map version, where the length of the shortest path was the key. The question was between the implementation of a Map sort algorithm after all paths have been found or a temp variable for shortest_path after each path has been found.

Chris Cirefice Over a year ago

It would stand to reason that the temp variable solution would be faster because only 1-2 lines of code (1-2 instructions) would be executed every time a path was found, whereas in a Map implementation, who knows how many lines need to be executed for TreeMap in Java to constantly sort the map as values are added. I assume that this may be slower simply because of the amount of instructions that need to be addressed for each element in the Map... Hopefully this makes sense.

Collectives™ on Stack Overflow

Java Map vs Array (Temp) Sort of Last Index

3 Answers 3

4 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related