3

In a Java 8 stream with a filter condition, every element in the collection is passed to the filter for checking the condition. Here I am writing two different filter conditions and giving different workflows.

public static void main(String[] args) {

    List<String> asList = Arrays.asList("a", "b", "c", "d", "e", "a", "b", "c");

    //line 1
    asList.stream().map(s -> s).filter(distinctByKey(String::toString)).forEach(System.out::println);

    Predicate<String> strPredicate = (a) -> {
        System.out.println("inside strPredicate method--");
        return a.startsWith("a");
    };

    //line 2
    asList.stream().filter(strPredicate).forEach(System.out::println);
}

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    System.out.println("inside distinctByKey method...");
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

In the above sample code, the statement line 1 filter condition is executing only once but line 2 is executing for every element in the collection output.

I thought the distinctByKey method would execute for every element in the collection, but it is not the case. Why ?

Also the Set object reference variable seen is executing only once? How is the flow working?

3
  • distinctByKey() runs only once because it creates a new lambda for the predicate, which then is executed on every element. Commented Aug 22, 2018 at 12:42
  • 1
    .map(s -> s) does literally nothing, by the way Commented Aug 22, 2018 at 12:43
  • 4
    you should also say that this code is taken literally from a Stuart Mark's answer Commented Aug 22, 2018 at 12:51

2 Answers 2

18

distinctByKey is a lambda factory method. It is returning a Predictate<T>.

So when you execute: filter(distinctByKey(String::toString)) you're in fact calling the distinctByKey method first, which then returns a Predicate. That predicate then gets executed for every element. Just the factory function will only be executed once.

When moving the System.out.println inside the returned lambda you'll get the desired print statements:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    System.out.println("inside distinctByKey method...");
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> {
        System.out.println("inside distinctByKey.lambda method... ");
        return seen.add(keyExtractor.apply(t));
    };
}
Sign up to request clarification or add additional context in comments.

Comments

4

That seen is captured by the lambda expression and cached inside the lambda, once you return the Predicate - the Predicate::test will be called multiple times with the same instance of seen

4 Comments

Now this answer adds to your answer, @Lino. The concept of capturing a local variable inside a lambda well deserves a new answer. This same technique can be applied to implement different functional programming concepts, i.e. memoization. +1
@FedericoPeraltaSchaffner should we also add that it's something one has to get used to? It's not trivial at all understanding this when you first see it
Yes, that's so true. It's not intuitive, when I first saw this technique I also thought that my method was going to be called for every element of the stream. But once you understand the difference between imperative and declarative, you see the light.
If one understands the concept of closures, e.g. from javascript, then this may be more familiar/intuitive

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.