1

I am trying to read a huge file which contains a word(different length) per line. I want to read it with multi-threading depends on the string length.

For example, thread one reads lines which has one length word, thread two reads two lengths and ...

Is there any way to achieve this? If it is, how will be affected the performance?

I found this examples, but I can't put together.

Reference 1 : Multithread file reading

Reference 2 : How to read files in multithreaded mode?

1
  • 1
    No, it would be like in your reference #2. One thread will read the file, and if the processing is complex, you might pass the lines to different threads for processing. Performance may or may not improve. Commented Feb 1, 2017 at 20:33

2 Answers 2

5

You can use multiple threads, however it won't be any faster. To find all the lines of a given length you have to read all the other lines.

Is there any way to achieve this?

Read all the lines and ignore the ones you filter out.

What you can do is to process different lines in different threads however it depends on how CPU intensive this is as to whether it helps or is slower.

Sign up to request clarification or add additional context in comments.

6 Comments

I am trying to compare words, are they anagram of each other and ı was thinking that categorizing words while reading will help to be faster. However, as you mentioned, reading all the lines to find lengths is an obstacle.
I guess, I have to focus reading file fragmentally like in reference#2. Do you have any suggestions to speed up?
@user1060251 if you want to check if many words are an anagram, sort the letters, and index them in a Map of sorted letters to all the words which are an anagram. This will give you O(n) time complexity.
My program is doing exactly same thing with single thread O(n) for loop and nlog(n) for sort.However, let say 10 billion lines how would I scale this?
@user1060251 However, let say 10 billion lines how would I scale this? Buy a faster storage system.
|
2

Reading a file in multithreading mode can only make things slower, since disk drive has to move heads between multiple points of reading. Instead, transfer computational work from the reading thread to worker thread(s).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.