1

I have few files and I have to cut few columns from that files to generate new files unix. I tried to do it in loop as selecting files in directory and generating new files but as directory having 100 such files it takes lot of time to generate new files.

Can anyone please help if I can select 10 files in parallel and generate 10 new files and again next set of 10 files as it will reduce the time.

i need sample unix code block for this

cut -b 1-10,25-50,65-79 file1.txt > file_cut1.txt

cut -b 1-10,25-50,65-79 file2.txt > file_cut2.txt
3
  • You can start 10 instances of your processing code, sending them to background (see &). Then monitor them. When they are all done, start the next batch of 10. Or - more sophisticated - start new instances as soon as some complete. It will process faster, but more difficult to code. Get started, when you have a specific issue, you can post a new question. Commented Sep 30, 2018 at 17:05
  • 100 files is nothing but of course that all depends on what sort of processing you are doing, how big the files are, your machine and the method you are using. Can you post your current code that you are using? Commented Sep 30, 2018 at 17:07
  • Your example is not very helpful. If you run it 100 times, you will re-write the same xyz.txt 100 times. Please be more specific about input and output filenames. Commented Sep 30, 2018 at 17:37

1 Answer 1

3

You can do that quite simply with GNU Parallel like this:

parallel 'cut -b 1-10,25-50,65-79 {} > {.}_cut.txt' ::: file*txt

where:

  • {} represents the current filename, and
  • {.} represents the current filename without its extension.

Make a backup of the files in your directory before trying this, or any unfamiliar commands.

It will process your files in parallel, doing N at a time, where N is the number of cores in your CPU. If you want it to do, say 8, jobs at a time, use:

parallel -j 8 ...

If you want to see what it would do, without actually doing anything, use:

parallel --dry-run ...
Sign up to request clarification or add additional context in comments.

1 Comment

Adjust -j8 and be aware that more parallism is not always faster: oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.