1

For example, I have a table written to a .txt file like:

Column_A  Column_B

Cell_1    Cell_2

Cell_3    Cell_4

Can I obtain the first and last lines in a single stream pipeline? Additionally, how can I collect this data into a map like:

{ "Column_A": "Cell_3", "Column_B": "Cell_4" }

I currently solve this problem with an ordinary loop, which works well, but I wonder if there is another solution using the Stream API.

I know how to get the first and last lines of different stream pipelines.

String[] keys=br.lines().findFirst().get().split(" ");

String[] values=br.lines().reduce((el1,el2)->el2).get().split(" ");

Next, I can create a map using these arrays. But this code is messy, as I think.

I'm wondering if there's a solution like:

HashMap<String,String> result = br.lines().filterFirstAndLast().flatMap(el->el.split(" ")) .collect(Collectors.toMap(keys,values))
6
  • The only way of reading the "last line" (assuming there are variable length lines terminated by some kind of newline character) is reading through the end of the file, as you did with the loop. The Stream API (or any other API) will do the same. In your proposed (still not working) code, you're reading ALL the file into an array, with no need, just for discarding everything but the first and last lines. I wouldn't do that. Commented Aug 14, 2024 at 17:44
  • 2
    @DiegoFerruchelli Not true if your filesystem allows skipping to arbitrary points in the file data. You could read backwards from the end until you reached the first encountered line terminator character, rather than reading the entire file from the beginning. But that wouldn't be using the lines() stream method. Commented Aug 14, 2024 at 18:17
  • @M.Justin You're absolutely right, and most filesystems support a negative argument to fseek() (or similar). Actually, the problem I see is not in the lines() method (after all, it doesn't really read anything at this point), but in the loop implied by the reduce(). Using Stream.skip(Stream.count()-1) may be a cleaner (but probably not faster) solution. It's a pity Stream.skip() doesn't support negative arguments. The other option would be to go low level using a SeekableByteChannel(). Commented Aug 14, 2024 at 18:47
  • I think an old-school loop is fine. Readability counts. Commented Aug 15, 2024 at 1:56
  • I misunderstood what the OP was looking for. I though he was looking for performance (which also counts), and my first comment was about the impossibility of reading a Stream backwards (because of the very own nature of streams). If we're just talking about files, then it's a different story. Commented Aug 16, 2024 at 17:37

3 Answers 3

2

Here is one way. In my opinion, using streams incurs unnecessary overhead. So I used an imperative approach. It uses a scanner to read in the lines, saving the column headings and then skipping the rest until the last line is reached and saved. Speed will certainly vary but on an old Intel i7 running Windows it reads over 1M lines in less than 1 second.

File f = new File("F:/ColumnData.txt");
Map<String, String> result = getLastLine(f);
result.entrySet().forEach(System.out::println);

prints something similar to the following:

Column_A=Cell_3
Column_B=Cell_4

public static Map<String, String> getLastLine(File file) {
    String[] headers = null;
    String[] data = null; 
    try (Scanner scanner = new Scanner(file)) {
        if (scanner.hasNextLine()) {
            headers = scanner.nextLine().split("\\s+");
        }
        String lastLine = "";
        while (scanner.hasNext()) {
            lastLine = scanner.nextLine();
        }
        data = lastLine.split("\\s+");
    } catch (IOException ioe) {
        ioe.printStackTrace();
    }
    return Map.of(headers[0], data[0], headers[1], data[1]);
}

Note that the returned Map is unmodifiable. If it needs to be modified, you can do the following:

return new HashMap<>(Map.of(headers[0], data[0], headers[1], data[1]));
Sign up to request clarification or add additional context in comments.

Comments

2

Your reduce((a,b) -> b) approach to get the last element is pointing to an approach that could get both. Recall that you can likewise use reduce((a,b) -> a) instead of findFirst(). Now combine them to get both:

record FirstAndLast(String first, String last) {}

FirstAndLast result = br.lines()
    .map(s -> new FirstAndLast(s, s))
    .reduce((a, b) -> new FirstAndLast(a.first(), b.last()))
    .orElse(new FirstAndLast(null, null));

but since br is hinting that your source is a BufferedReader, there’s a much simpler solution:

String first = br.readLine(),
       last = first == null? null: br.lines().reduce((a, b) -> b).orElse(first);

Note that splitting the two string into arrays and creating a map for the elements at the same index is an entirely unrelated task. There is no way how the first task of getting the intended line can be merged with the second task such that it becomes simpler or more efficient.

I’m sure there are already existing Q&A about creating such a map from two arrays.

Comments

1

I don't particularly recommend the following approach, because it's probably a bit overkill, and there are more efficient ways to get this data than reading the entire file a line at a time.

However, if you really want to get the first and last elements in a single pass over the stream, one solution would be to create a custom Gatherer using the JEP 485: Stream Gatherers Java 24 language feature (available as a preview language feature since Java 22) to capture the first and last elements of the stream, and throw away the rest.

public class FirstAndLast {
    // For production-quality code, add and use getters, setters, and constructors
    public String first, last;
}
Gatherer<String, ?, FirstAndLast> firstAndLastGatherer =
        Gatherer.of(
                FirstAndLast::new,
                Gatherer.Integrator.ofGreedy((state, element, downstream) -> {
                    if (state.first == null) {
                        state.first = Objects.requireNonNull(element);
                    }
                    state.last = Objects.requireNonNull(element);
                    return true;
                }),
                (s1, s2) -> {
                    s1.last = s2.last;
                    return s1;
                },
                (state, downstream) -> downstream.push(state)
        );

FirstAndLast firstAndLast = 
        br.lines().gather(firstAndLastGatherer).findFirst().orElseThrow();

String firstLine = firstAndLast.first;
String lastLine = firstAndLast.last;

This custom gatherer stores the first and most recently encountered elements as first and last, overwriting the last variable for each new element. It then emits a single FirstAndLast element to the stream, resulting in a stream with just the one single element containing the first and last items.


Alternatively, a gatherer could be written that keeps the first and last elements, and discards the elements in between, resulting in a 2-element stream:

class State {
    String first, last;
}
Gatherer<String, ?, String> firstAndLastGatherer = Gatherer.of(
        State::new,
        Gatherer.Integrator.ofGreedy((state, element, downstream) -> {
            if (state.first == null) {
                state.first = Objects.requireNonNull(element);
            } else {
                state.last = Objects.requireNonNull(element);
            }
            return true;
        }),
        (s1, s2) -> {
            s1.last = s2.last;
            return s1;
        },
        (state, downstream) -> {
            if (state.first != null) {
                downstream.push(state.first);
            }
            if (state.last != null) {
                downstream.push(state.last);
            }
        }
);

List<String> firstAndLast = list.stream().gather(firstAndLastGatherer).toList();
String first = firstAndLast.getFirst();
String last = firstAndLast.getLast();

Javadocs

Gatherer:

An intermediate operation that transforms a stream of input elements into a stream of output elements, optionally applying a final action when the end of the upstream is reached. […]

[…]

There are many examples of gathering operations, including but not limited to: grouping elements into batches (windowing functions); de-duplicating consecutively similar elements; incremental accumulation functions (prefix scan); incremental reordering functions, etc. The class Gatherers provides implementations of common gathering operations.

API Note:

A Gatherer is specified by four functions that work together to process input elements, optionally using intermediate state, and optionally perform a final action at the end of input. They are:

Stream.gather(Gatherer<? super T,?,R> gatherer):

Returns a stream consisting of the results of applying the given gatherer to the elements of this stream.

Gatherer.of(initializer, integrator, combiner, finisher)

Returns a new, sequential, Gatherer described by the given initializer, integrator, combiner and finisher.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.