0

I tried to read a file in a way that is performant enough for my purpose. I have a list of file ids, names and line indexes (ordered) and for each pair (file_id, file_name, line_index) I need to open the file, find the line by index and print.

To be more performant (I know the input is ordered) I'd like to cache the BufReader that reads by line and let the file stay open if possible.

fn main() {
    // positions in file
    // structure: (file_id, file_name, line_index_in_file)
    let positions = &vec![
        (1, String::from("file1"), 1), 
        (1, String::from("file1"), 2), 
        (1, String::from("file1"), 20), 
        (2, String::from("file2"), 15)];

    print_lines_from_file(&positions);
}

fn print_lines_from_file(found: &Vec<(i32, String, i32)>) {
    let mut last_file_id = -1;

    //let mut last_file_name = None;
    let mut open_file = None;
    let mut open_reader = None;

    for &(file_id, ref file_name, pos_in_file) in found {
        println!("{} {}", file_id, pos_in_file);

         if last_file_id < file_id {
            last_file_id = file_id;
            //last_file_name = file_ids.get(&file_id);

            if let Some(to_close) = open_file {
                drop(open_reader.unwrap());
                drop(to_close);
            }
            //let file = File::open(last_file_name.unwrap()).unwrap();
            let file = File::open(file_name).unwrap();
            open_file = Some(file);
            open_reader = Some(BufReader::new(&file));
        }

        // use reader to find the line in file and process
    }
}

I'm facing this problem:

main.rs:40:48: 40:52 error: `file` does not live long enough
main.rs:40             open_reader = Some(BufReader::new(&file));

main.rs:40:48: 40:52 error: use of moved value: `file` [E0382]
main.rs:40             open_reader = Some(BufReader::new(&file));

It's obvious (file's lifetime is really short), but I don't know how to workaround it. The BufReader depends on File, but I need to close the File later in loop when file_id changes.

Also I don't feel very comfortable calling drop this way in loop as it looks to me like I try to fool the compiler. Is that approach ok?

Please even if you know better solution (e.g. how to close the file through BufReader, I would appreciate the general insight how to solve it).

3 Answers 3

3

You can pass the File by value to the BufReader. This way you only have a single variable that owns a file handle. You can use take on the Option to move the inner value out of it and leave a None behind. This way you ensure that the file handle is freed before the next one is taken (so if you re-open the same file it doesn't panic)

let mut open_reader = None;

for &(file_id, ref file_name, pos_in_file) in found {
    println!("{} {}", file_id, pos_in_file);

     if last_file_id < file_id {
        last_file_id = file_id;
        //last_file_name = file_ids.get(&file_id);

        // take the value out of the `open_reader` to make sure that
        // the file is closed, so we don't panic if the next statement
        // tries to open the same file again.
        open_reader.take();
        //let file = File::open(last_file_name.unwrap()).unwrap();
        let file = File::open(file_name).unwrap();
        open_reader = Some(BufReader::new(file));
    }

    // use reader to find the line in file and process
}
Sign up to request clarification or add additional context in comments.

5 Comments

The explicit drop isn't needed, it seems.
@Shepmaster: but it's more explicit, that's why I included it. Alternatively one could add a comment stating the intend... Which is probably better :)
I edited the text to include a comment instead of the drop
Still I don't understand exactly what's the mechanism that ensures closing the reader. Is it because we take the value from Option, don't store it anywhere, so rust is clever enough to make a cleanup and release it immediatelly?
correct. You could simply overwrite the open_reader, but that would fail if you'd open the same file again, because the file is already open.
2

You're giving ownership of the file to the BufReader (which is obvious since it's passed by value), rather than lending it - it's now BufReader's job to close the file. When it's dropped, the File it owns will be dropped in turn; so you can just lose open_file completely.

The compiler is successfully stopping you from possibly destroying the file under the BufReader's feet.

Comments

2

I'd like to cache the BufReader that reads by line and let the file stay open if possible.

The easiest way to do that is to group the data ahead of time:

use std::collections::HashMap;

fn print_lines_from_file(found: &[(i32, String, i32)]) {
    let mut index = HashMap::new();
    for line in found {
        let name = &line.1;
        index.entry(name).or_insert_with(Vec::new).push(line);
    }

    for (file_name, lines) in &index {
        let file = File::open(file_name).unwrap();

        for &&(file_id, _, line_index) in lines {
            // do something with `file`
            println!("processing ID {} ({}) line {}", file_id, file_name, line_index);
        }
    }
}

Note that this frees you from having to have a special sentinel value for file_id (which could also be done with an Option). Additionally, even though you say that the data is sorted, this allows you to handle the cases where the file_ids are not. You could also handle the case of unsorted line_indexes by sorting the vector after it's complete.

Additionally:

  1. You have a double reference in main — you don't need to say &vec![...].
  2. You should accept a &[T] instead of a &Vec<T>.

An even more beautiful solution, IMHO, is to use itertools, specifically group_by_lazy:

extern crate itertools;

use itertools::Itertools;
use std::fs::File;
use std::io::BufReader;

fn main() {
    // structure: (file_id, file_name, line_index_in_file)
    let positions = [
        (1, String::from("file1"), 1),
        (1, String::from("file1"), 2),
        (1, String::from("file1"), 20),
        (2, String::from("file2"), 15)
    ];

    print_lines_from_file(&positions);
}

fn print_lines_from_file(found: &[(i32, String, i32)]) {
    for (filename, positions) in &found.iter().group_by_lazy(|pos| &pos.1) {
        println!("Opening file {}", filename);
        // let file = File::open(file_name).expect("Failed to open the file");
        // let file = BufReader::new(file);

        for &(id, _, line) in positions {
            println!("Processing ID {}, line {}", id, line);
        }
    }
}

3 Comments

This is nice idea. It takes more memory than necessary, but it might be useful. (there is more data than in the sample I made via vec![..] - I wated to make the sample simple enough)
@stej new version that looks pretty nice ;-)
Great idea. I accepted ker's answer as it is more related to my question, but yours solution is nice. I would upvoted it twice if possible ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.