2

how to read a binary file in chunks in scala.

This was what I was trying to do

val fileInput = new FileInputStream("tokens")
    val dis = new DataInputStream(fileInput)
    var value = dis.readInt()
    var i=0;
println(value)

the value which is printed is a huge number. Whereas it should return 1 as the first output

3
  • Be sure to have the file in the correct endianess. Commented Feb 17, 2012 at 21:00
  • Which large number is it exactly? Does it happen to be 16777216? If so, you've got an endian problem. Commented Feb 17, 2012 at 21:02
  • yes its 16777216.. its messing up in endian. It should be 1. How do i correct it? Commented Feb 17, 2012 at 21:03

1 Answer 1

10

Because you're seeing 16777216 where you'd expect to have a 1, it sounds like the problem is the endianness of the file is different than the JVM is expecting. (That is, Java always expects big endian/network byte order and your file contains numbers in little endian.)

That's a problem with a well established gamut of solutions.

  • For example this page has a class that wraps the input stream and makes the problem go away.

  • Alternatively this page has functions that will read from a DataInputStream.

  • This StackOverflow answer has various snippets that will simply convert an int, if that's all you need to do.

  • Here's a Scala snippet that will add methods to read little endian numbers from the file.

The simplest answer to your question of how to fix it is to simply swap the bytes around as you read them. You could do that by replacing your line that looks like

var value = dis.readInt()

with

var value = java.lang.Integer.reverseBytes(dis.readInt())

If you wanted to make that a bit more concise, you could use either the approach of implicitly adding readXLE() methods to DataInput or you could override DataInputStream to have readXLE() methods. Unfortunately, the Java authors decided that the readX() methods should be final, so we can't override those to provide a transparent reader for little endian files.

object LittleEndianImplicits {
  implicit def dataInputToLittleEndianWrapper(d: DataInput) = new DataInputLittleEndianWrapper(d)

  class DataInputLittleEndianWrapper(d: DataInput) {
    def readLongLE(): Long = java.lang.Long.reverseBytes(d.readLong())
    def readIntLE(): Int = java.lang.Integer.reverseBytes(d.readInt())
    def readCharLE(): Char = java.lang.Character.reverseBytes(d.readChar())
    def readShortLE(): Short = java.lang.Short.reverseBytes(d.readShort())
  }
}

class LittleEndianDataInputStream(i: InputStream) extends DataInputStream(i) {
  def readLongLE(): Long = java.lang.Long.reverseBytes(super.readLong())
  def readIntLE(): Int = java.lang.Integer.reverseBytes(super.readInt())
  def readCharLE(): Char = java.lang.Character.reverseBytes(super.readChar())
  def readShortLE(): Short = java.lang.Short.reverseBytes(super.readShort())
}

object M {
  def main(a: Array[String]) {
    println("// Regular DIS")
    val d = new DataInputStream(new java.io.FileInputStream("endian.bin"))
    println("Int 1: " + d.readInt())
    println("Int 2: " + d.readInt())

    println("// Little Endian DIS")
    val e = new LittleEndianDataInputStream(new java.io.FileInputStream("endian.bin"))
    println("Int 1: " + e.readIntLE())
    println("Int 2: " + e.readIntLE())

    import LittleEndianImplicits._
    println("// Regular DIS with readIntLE implicit")
    val f = new DataInputStream(new java.io.FileInputStream("endian.bin"))
    println("Int 1: " + f.readIntLE())
    println("Int 2: " + f.readIntLE())
  }
}

The "endian.bin" file mentioned above contains a big endian 1 followed bay a little endian 1. Running the above M.main() prints:

// Regular DIS
Int 1: 1
Int 2: 16777216
// LE DIS
Int 1: 16777216
Int 2: 1
// Regular DIS with readIntLE implicit
Int 1: 16777216
Int 2: 1
Sign up to request clarification or add additional context in comments.

3 Comments

@Gaurav, it may be obvious, but you might consider extending DataInputStream.
@EdStaub, would you believe that Java makes it rather inconvenient to extend DataInputStream because that class marks readInt(), etc. as final?
Oops - should have checked for that, I've been burned enough with finality. Delegate, then, if it's worth it. In the example, where few DataInputStream methods are used, it probably is.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.