1

I have a huge .txt file and format is like below:

29 clueweb12-1500wb-39-00001
19 clueweb12-1500wb-39-00002
20 clueweb12-1500wb-39-00003

I need to read that file line by line and separate two parts. The first part has scores(29,19,20) and the second part is docIds (clueweb12-1500wb-39-00001). I read to txt file line by line by using stream but how can i put these two parts in a String?

Stream<String> lines = File.lines(Paths.get("path-to-file");
lines.forEach(s -> s.split(" "));

`

2
  • What should be the output? Commented Mar 28, 2017 at 14:31
  • Actually, ı put these part into a Map<Integer ,List<String> to do this i need to two of them separately Commented Mar 28, 2017 at 14:37

4 Answers 4

2

To make the code clearer, you could use simple foreach loop:

Stream<String> lines = File.lines(Paths.get("path-to-file");
lines.forEach(s -> s.split(" "));

/**
* Takes a stream, splits group by first part of the string:
*/
public Map<Integer, List<String>> split(Stream<String> a) {

    Map<Integer, List<String>> result = new HashMap<>();

    a.forEach(s -> {
        String[] pair = s.split(" ");

        Integer key = Integer.valueOf(pair[0]);
        String value = pair[1];

        // as 4castle suggested - to avoid unnecessary computation
        result.computeIfAbsent(key, key -> new ArrayList<>());

        result.get(key).add(value);
    });

    return result;
}

Or you can map your input directly in the stream processing:

a.map(s -> s.split(" "))
 .forEach(pair -> {
     Integer key = Integer.valueOf(pair[0]);
     String value = pair[1];

     result.putIfAbsent(key, new ArrayList<>());    
     result.get(key).add(value);
 });
Sign up to request clarification or add additional context in comments.

2 Comments

In order to avoid creating a new ArrayList<>() on the iterations where it's not needed, you can use result.computeIfAbsent(key, k -> new ArrayList<>())
@4castle yes, absolutely right! I will edit the answer.
1

Use Collectors.groupingBy with a downstream collector which gets the second part of the split line before collecting to a list.

Map<Integer, List<String> table =
    Files.lines(Paths.get("path-to-file"))
         .map(line -> line.split(" ", 2))
         .collect(Collectors.groupingBy(
             parts -> Integer.valueOf(parts[0]),
             Collectors.mapping(parts -> parts[1], Collectors.toList())
         ));

Comments

1

The Java streams way, I believe, is:

    Map<Integer, List<String>> parts = lines.map(s -> s.split(" "))
            .collect(Collectors.groupingBy(splitLine -> Integer.valueOf(splitLine[0]),
                    Collectors.mapping(splitLine -> splitLine[1], Collectors.toList())));

This gives you the following map:

{19=[clueweb12-1500wb-39-00002], 20=[clueweb12-1500wb-39-00003], 29=[clueweb12-1500wb-39-00001]}

Its toString method doesn’t give you the most readable output, but I believe it’s the map you asked for. For now there is only one string in each list, but if multiple lines have the same score, there will be more.

Comments

0

You can get in the HashMap like this : Read the file and split it using String Split function and save in into the HashMap key value pair.

public static HashMap<Integer, String>  readFile(String fileName) throws IOException {
    BufferedReader br = new BufferedReader(new FileReader(fileName));
    try {
        HashMap<Integer, String> fileData = new HashMap<>(); 
        String line = br.readLine();

        while (line != null) {
            String[] lineData = line.split(" ");
            System.out.println(lineData[0]+" "+lineData[1]);
            fileData.put(Integer.valueOf(lineData[0]), lineData[1]);
            line = br.readLine();
        }
        return fileData;
    } finally {
        br.close();
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.