4

Using Chrome, I am trying to read and process a large (>4GB) binary file on my local disk. It looks like the FileReader API will only read the entire file, but I need to be able to read the file progressively as a stream.

This file contains a sequence of frames containing a 1-byte type identifier, a 2-byte frame length, an 8-byte time stamp, and then some binary data that has a format based on the type. The content of these frames will be accumulated, and I'd like to use HTML5+JavaScript to generate graphs and display other metrics as real-time playback based on the content of this file.

Anybody have any ideas?

1 Answer 1

7

Actually, Files are Blobs, and Blob has a slice method, which we can use to grab smaller chunks of large files.

I wrote the following snip last week to filter large log files, but it shows the pattern you can uses to loop sub-section-by-sub-section through big files.

  1. file is the file object
  2. fnLineFilter is a function that accepts one line of the file and returns true to keep it
  3. fnComplete is a callback where the collected lines are passed as an array

here is the code i used:

 function fileFilter(file, fnLineFilter, fnComplete) {
     var bPos = 0,
         mx = file.size,
         BUFF_SIZE = 262144,
         i = 0,
         collection = [],
         lineCount = 0;
     var d1 = +new Date;
     var remainder = "";

     function grabNextChunk() {

         var myBlob = file.slice(BUFF_SIZE * i, (BUFF_SIZE * i) + BUFF_SIZE, file.type);
         i++;

         var fr = new FileReader();

         fr.onload = function(e) {

             //run line filter:
             var str = remainder + e.target.result,
                 o = str,
                 r = str.split(/\r?\n/);
             remainder = r.slice(-1)[0];
             r.pop();
             lineCount += r.length;

             var rez = r.map(fnLineFilter).filter(Boolean);
             if (rez.length) {
                 [].push.apply(collection, rez);
             } /* end if */

             if ((BUFF_SIZE * i) > mx) {
                 fnComplete(collection);
                 console.log("filtered " + file.name + " in " + (+new Date() - d1) + "ms  ");
             } /* end if((BUFF_SIZE * i) > mx) */
             else {
                 setTimeout(grabNextChunk, 0);
             }

         };
         fr.readAsText(myBlob, myBlob.type);
     } /* end grabNextChunk() */

     grabNextChunk();
 } /* end fileFilter() */

obviously, you can get rid of the line finding, and just grab pure ranges instead; i wasn't sure what type of data you need to dig through and the important thing is the slice mechanics, which are well-demonstrated by the text-focused code above.

Sign up to request clarification or add additional context in comments.

1 Comment

Works like a champ. Thanks!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.