3

How does cache locality impact the performance of ArrayList compared to LinkedList in Java?

I've often heard that ArrayList has an advantage in terms of cache locality, but I don't fully understand why. Since Java stores objects in memory as references, wouldn't accessing elements in either list require jumping to random locations in memory?

7
  • 6
    "In java, after all, any list is a list of links, that is, even after downloading several consecutive links, we will have to search for an object in a random place in memory for each link" - citation needed. Commented Apr 26 at 19:53
  • 1
    @Sören I'm not sure if this counts as proof, but here's what I could find stackoverflow.com/questions/51598148/… Commented Apr 26 at 19:59
  • 3
    OP is actually correct that both ArrayList and LinkedList store references to heap-allocated objects, so compared to a language like C++ which stores elements in-place, there is an extra level of indirection in Java. The difference between ArrayList and LinkedList is that the former stores those references in consecutive memory, while the latters stores them in separate heap allocated node objects. So iterating over a LinkedList requires 2 heap accesses per element, while ArrayList requires only 1. So yes, ArrayList is probably faster in most cases. But it's still slow compared to C++. Commented Apr 26 at 20:20
  • 2
    @MaksVerver I think I understood, since the LinkedList is a list of nodes in which the object itself is also stored by reference, then during the iteration we will access memory for both the object itself and the next node. And there will be the same in the array, but accessing the next "node" will be fast, since it is not a random place in memory. Did I understand correctly? Commented Apr 26 at 20:49
  • 1
    Yeah, I think your on the right direction. In simpler terms: when iterating over a LinkedList, both the object reference and the next node reference need to be resolved, often causing more cache misses than an ArrayList, where the next element is simply the next position in memory. Commented Apr 26 at 21:29

3 Answers 3

9

Hold on, let's first illustrate how java works, because without that knowledge you can't make heads or tails of these conflicting statements about 'everything is a link'.

In java, every variable is fixed size and very small. This is highly convenient, because it allows making such grand, sweeping statements such as "whenever you call a method, every parameter is copied to the method you are calling; that method can do whatever it wants with these copies, and the copies poof out of existence once that method has finished execution". After all, if you make that statement and a variable is 2 GB large, then passing it around will very very swiftly cause out of memory errors.

But, how does that work? Surely List<String> list = enumerateEveryWordInTheCollectedWorksOfShakespeare(); is not "fixed, small size".

That's where the second part kicks in: In java, you have primitives, which is this hardcoded list of types: int, long, short, byte, double, float, char, boolean. (That's been the list since forever, this list will never change, you cannot make your own primitives ^1) - each and every one of those is 'fixed, small size' (specifically, they are all 64 bits or less). And everything else is a reference. A pointer.

When you write:

String x = "hello";

Then it is incorrect to say "x is hello". It's not. x is a reference to it.

It's like "Hello" is a house (think about it: Strings can be arbitrarily long. They can be the entire collected works of shakespeare. Very much not 'fixed, small size'), and x is merely a page in an address book. It's an instruction that tells you how to get to the house. Given that page, if you wanna know if the house is red, you can do it - just.. go there, and look. All you need is the page. But you do need to spend the time, so to speak.

ArrayList

ArrayList is 'a list of links' in that exact sense: By definition, you can't have an arraylist of primitive values (you can have a List<Integer>; Integer is the boxed, i.e. 'reference' version of int. List<int> is not valid java), hence, it must be a list of references. That means an ArrayList is an address book. It's a list of addresses.

The list is 'contiguous' - meaning, the address book is in one piece, and in the order you expect it to be. If you're on page 5 of the address book, and you're intrigued about the address on page 6, then I can guarantee you page 6 is extremely closeby. Simply.. flip the page, and there it is. Guaranteed.

One downside of arraylists is that, like actual address books, they have a fixed size. So what happens when you fill up your address book? Well, the implementation of ArrayList will play some secret magic to make it appear as if the list doesn't actually have a 'whoops it is full' issue: It buys a new, larger address book, manually copies over each and every address in the old one, swiftly replaces your address book with this new one, and chucks the old one in the garbage, all just as part of its normal operations. Calling methods isn't so much 'doing' a thing to an object, it's asking the object to do it for you. ArrayList is an address book that is capable of understanding how to go out, buy a new larger one, copy itself into the new one, transfer its consciousness into the new one and then chuck its old self in the bin.

LinkedList

LinkedList is like an address book that you rip each page out of, and then just scatter them all across the room. You know where page 1 of the address book is. But that is all you know. Fortunately, each and every page lists the location where you hid the next page, as well as the previous page. So, if you want to, say, go to the house listed in your address book on page 5, because you want to know what colour it is painted, you find page 1, and it says "You shoved page 2 behind the couch", you go find page 2, which says "page 3 is on top of the refridgerator", you find page 3, which says "page 4 is on the desk", and page 4 says "page 5 is also on the desk", and then finally you can go over to that house.

This is a ridiculously time consuming process. It does have one single advantage: You never need to buy a new address book and copy the old full one over to it. After all, with this scattered pages system, if you run out of pages, just scatter some more blank pages across the room and keep going. This isn't much of an advantage, but it is something, I guess.

Now, if you so happen to create a linked list and immediately fill the first 500 pages of it, then the situation is likely that all 500 pages are still in a stack on your desk more or less sorted exactly. But because you can't be sure, even if this is the case, if you want to go to the house listed on page 250, you still have to go through each page in turn, whereas with the arraylist address book you can just go right to page 250 in one go.

But LinkedList does not require that you fill it up neatly as you make it. If you fill it up over time, then you get the 'pages are all over the place' scenario.

... it gets worse!

Each 'page' of our linkedlist address book actually stores 3 things. Not one thing. It stores the location of the previous page, the location of the next page, and, of course, the address of the house. Each of those things is a 'fixed small thing' (a reference). But those 3 together - that's not how java works, that's too much.

So in actual fact a LinkedList is more like 'each page of the address book is actually a tiny little address book with 3 pages in it, one explaining where the previous postit is, one where the next postit is, and one with the address of the house', and those mini-address books are all over the place, as well as a bunch of postits that explain where to find the mini address books.

and now back to how it works in java terms

A LinkedList consists of a reference to the first node and the last node. A node is a small object that consists of a reference to the next node, to the previous node, and to the object this entry in the list represents. To e.g. iterate through the first 10 items of a linked list and e.g. print them all, the JVM has to resolve the reference to the linked list, from there resolve the reference to the first node, from there resolve the reference to the object and print that, then resolve the reference to the next node, resolve the reference to the object and print that, resolve the reference to the next node, and so on.

Each little node object is created as you add an object, and unless you so happen to all do that neatly all in one go, these node objects are scattered all over your heap. And of course, the actual objects that the list contains may or may not be scattered all over the heap, depends on when they were made.

In contrast, with an ArrayList, you simply have a guaranteed consecutive series of references to the actual objects contained in the list. Those objects may not be consecutive, but at least the refs to them are.

So what does this mean?

It boils down to just 2 words, which are all you must know as a java programmer about linked lists.

LinkedList BAD.

That's it. That's all you really need to know about java.util.LinkedList. The number of cases where an LL is the correct answer is vanishingly small. ArrayList is often better, but certainly not always; however, some other collection variant will be better than LL in pretty much every imaginable use case.

Various language agnostic takes on LL make claims that might apply to a LinkedList but they do not apply to java's take on them. For example, 'the advantage of a linked list is that, given an object in the list, it is very fast to insert an object right after it'. This is false - given an entry in a linked list you can't "get back" to the linked list's node in java. In essence the one and only way to even begin to enjoy the benefits of LLs in java is to use the very rarely used .listIterator() method which indeed can do a quick insertion of a value in a way ArrayList cannot compete with. Without listIterator the only thing LL can do that AL cannot do, is 'fast add/remove both from start and from end.

But if that's what you want, use ArrayDeque - much more efficient at the job of 'a list thing that lets you fast add/remove from either end'.

A note about the concept 'linked list'

Many of these rules do not apply to the concept of linked lists. You are free to take some class definition, add the fields private SelfType prev, next; and have an object itself track its previous and next element. This offers a few advantages. For example, given some object you can now cheaply insert an object right behind it or right in front of it. This concept also suffers less from 'the JVM is travelling all over the heap thus running into lots of delay due to cache page misses', if you are iterating over every object and actually need data from the object in addition to the link to the next object in the list.

It also has downsides: Such a thing does not implement java.util.List, and an object can only ever be in one list. If this is what you're interested in, most of this answer does not apply.


[1] Various aspects of valhalla and panama mean these statements are going to need a lot of caveats soon. As of JDK24, this is accurate. If you know what Project Valhalla and Panama are - yes, change is on the horizon.

Sign up to request clarification or add additional context in comments.

8 Comments

I admire you for simply staying to make an answer so detailed and educative.
It’s important to emphasize that “java's take on linked lists” refers to java.util.LinkedList which implements a general purpose java.util.List. There are other possibilities, even in Java. For example, application specific objects having a next/previous pointer, (like Instruction objects in ASM’s Tree API) which eliminates the indirection but also lacks the general purpose aspect of the Collection API (e.g. such objects can only be in one list at a time).
Another approach would be to expose the node objects of a general purpose list. Then, you could do things like cutting a list in the middle and cheaply append one half to another list (not a thing in the Collection API). In contrast, removing one half of a java.util.LinkedList and adding it to another implies abandoning all associated internal node objects and creating new ones for the target list. Which creates more load to the garbage collector and linked lists are less garbage collector friendly in general.
@rzwitserloot that’s reasonable, for sure. But still I know, sooner or later, readers will show up, confusing the specific class and the concept. It wasn’t meant to say that your answer was missing something in the context of this specific question.
@Holger Could be. Just in case that happens, I've added 1 extra chapter at the end and cleared up the 'linkedlist bad' section to make extra clear that's referring specifically to j.u.LinkedList and not the concept.
|
2

I am going to answer this in two ways: with a technical answer and with a cool and easy to follow example. I had to do some research before answering this and simplifying things always help me, so let's first introduce the cache terminology:

Cache: a cache is like a special reading desk where the most frequently used books are placed. Instead of running deep into the shelves every time, the computer can grab the specific book from the desk.

Now how does this relate to your question?

The ArrayList stores data in a row in the reading desk. So when the computer reads one book, it already has the next one nearby, making reading super fast and efficient. Therefore, making sequential access faster.

The LinkedList is like a scattered set of books where you have to follow notes leading to each next book. Every time the computer needs a new book, it has to walk through another shelf.

Now you might want a more "technical" answer, so here it is:

In ArrayList:

Since all elements are on a single block, accessing one element automatically loads nearby elements into the cache. This obviously makes iteration much faster, as the CPU retrieves more elements on one go.

In LinkedList:

Since nodes are scattered across memory, accessing one node doesn’t guarantee the next one is nearby. Each node access involves a pointer dereference, leading to higher cache misses.

Please note: While ArrayList stores its references contiguously in memory, the actual objects they point to can be scattered throughout the heap. This means accessing the object itself may require additional lookups, potentially reducing cache efficiency. However, when iterating over an ArrayList, accessing sequential elements is still faster than LinkedList since those references are stored in order, leveraging cache locality. (this was pointed out by @David Conrad)

To sum everything up: when iterating over a LinkedList, both the object reference and the next node reference need to be resolved, often causing more cache misses than an ArrayList, where the next element is simply the next position in memory.

Here is an article that helped me: Why does cache locality matter for array performance?. This discussion on Stack Overflow is similar. It dives deeper into the advantages of cache locality in arrays over LinkedLists.

2 Comments

While ArrayList does store a block of data, that data is references (pointers) to the objects in the list, which are not stored in the ArrayList, but scattered across memory.
@DavidConrad, I mentioned you in my edit, as it was you that pointed this out. Here is what I added: "Please note: While ArrayList stores its references contiguously in memory, the actual objects they point to can be scattered throughout the heap. This means accessing the object itself may require additional lookups, potentially reducing cache efficiency. However, when iterating over an ArrayList, accessing sequential elements is still faster than LinkedList since those references are stored in order, leveraging cache locality. (this was pointed out by David Conrad)"
1

Yes, ArrayList stores references in a contiguous array, making sequential access faster for the CPU cache. LinkedList nodes are scattered in memory, causing more cache misses.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.