0

I hava a binary file (actually a dBF) I would like to read with Java. I am using a FileInputStream and a BufferedReader then reading the required bytes it a char[].

    FileInputStream fis;

    char[] header = new char[32];

    try {
        fis = new FileInputStream(source_url);

        BufferedReader br;
        String line;

        br = new BufferedReader(new InputStreamReader(fis, Charset.forName("UTF-8")));
        br.read(header);
        ....

The problem is that the values I read into the array aren't always exactly what is in the file. For example the value 0xE1 is read as 0xFD. I have tried different character sets with no change and reading the value as various types long, int, byte, and using format string to hex, in all cases it looks like 0xFD.

The values are definatley wrong, I can read okay in a C++ program, because it understands unsigned ints, and can read in hex fileviewer.

Am I using the correct classes to read binary data? Am I missing something? I'm trying to avoid using external libraries as I'm just trying to read the file which should be pretty simple.

2
  • 2
    if it's a binary file, then it's NOT utf, and various perfectly acceptable byte sequences in your file will be misinterppreted as multi-byte UTF-8 chars. Commented Mar 20, 2014 at 2:01
  • Classes called XxxxReader are for reading text. Classes called XxxxxInputStream are for reading binary data. Commented Mar 20, 2014 at 2:05

2 Answers 2

5

If this is a binary file, do NOT use a Reader of any kind.

A Reader takes a sequence of bytes and tries to decode it into characters to the best of its ability (which depends on the encoding).

And as this is a binary file, there will be many byte sequences which will not be translatable. As a result, you'll lose data...

I can read okay in a C++ program, because it understands unsigned ints, and can read in hex fileviewer.

This has nothing to do with unsigned at all. Java's primitive integer types (except for char, see below) are signed, yes; but they are still bits. There is no such thing as a signed bit.

Note that in Java, a char is NOT a byte. It is a 16bit unsigned integer number expressly designed to hold characters.

To read binary data efficiently, use Files.newByteChannel(), or FileChannel.open(). With the latter you can map the file into memory if you wish to... See also Files.readAllBytes().

Especially if your binary data is structured, use FileChannel.open() since a FileChannel implements ScatteringByteChannel.

Obligatory link <-- keep that open and read what you need to :) All the classnames in this answer are documented there.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks heaps. I had been using java 6 which doesn't seem to have import java.nio.file.Files. Still have to mess around converting what java thinks are negative numbers to an unsigned int but I can live with that.
You "have been" using? And what about now? You should go 7 as fast as you can; it is already old now, since Java 8 is out
0

if you are reading binary data then you do not want to have it converted to UTF-8

Also you do not want a bufferedReader.

try

 fis = new FileInputStream(source_url);

 while (br = fis.read () != -1) {
    // save data to byte array
 }

2 Comments

I don't think you want InputStreamReader, but even so you should probably use the other read overload that reads into a byte array for efficiency.
eeks I missed that - will fix

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.