Java IO classes - troubles with file IO

Question

I intitialise a BufferedReader as such:

Reader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"));

where filename is any given string.

When I process the output through a loop as such:

int k;
while((k = reader.read()) != -1){
        String entry;
        if (dict.containsKey(k))
            entry = dict.get(k);
        else if (k == mapSize)
            entry = w + w.charAt(0);
        else
            throw new IllegalArgumentException("Bad compressed k: " + k);
        this.fos.write(entry);
        result += entry;

        // Add w+entry[0] to the dictionary.
        dict.put(mapSize++, w + entry.charAt(0));

        w = entry;
}

it only reads 65536 number of characters before hitting the EOF. Anyone know what's going on here?

Don't use the ready() method, and show us how you actually read your chars. — JB Nizet
– JB Nizet, Commented Apr 29, 2012 at 14:10
@Jeffrey, yeah, my bad. Max for a 16bit number though :) I changed it from using the reader.ready() method but I am still encountering the same problem. — Sam P
– Sam P, Commented Apr 29, 2012 at 14:18
Probably not related to your problem, but your code seems to treat every UTF-8 character in the file as an arbitrary number. Are you sure you really have UTF-8 characters in this file? — JB Nizet
– JB Nizet, Commented Apr 29, 2012 at 14:25

Peter Lawrey · Accepted Answer · 2012-04-29 14:12:38Z

2

You don't need to call ready(). Just read the data or lines

String line;
while((line = reader.readLine()) != null) {
    //process, LZW algorithm
}

or

// buffer is redundant if you are reading large blocks.
Reader reader = new InputStreamReader(new FileInputStream(filename), "UTF-8");
char[] buffer = new char[8*1024];
int len;
while((len = reader.read(buffer)) > 0) {
    // process text
}

answered Apr 29, 2012 at 14:12

Peter Lawrey

535k83 gold badges770 silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sam P Over a year ago

Still having the same problem. For example: while((k = reader.read()) != -1) and then directly processing each character read still only outputs 65k characters

Peter Lawrey Over a year ago

Can you print new File(filename).length() ?

Sam P Over a year ago

System.out.println(new File(filename).length() = 64398

Sam P Over a year ago

This is compressed data, btw.

Sam P Over a year ago

the original precompressed file's length = 81362 bytes, the compressed file has 64398 bytes, and the decompressor only reads 25381 characters, extracting 65536 characters

jtahlborn · Accepted Answer · 2012-04-29 15:04:31Z

0

You are attempting to read binary data as character data. that's going to go badly. utf8 is a multi-byte character encoding, which means the number of characters you read from the file may not equal the number of bytes in the file. if you are trying to implement a decompression algorithm, you should be using an InputStream and reading bytes, not chars.

answered Apr 29, 2012 at 15:04

jtahlborn

53.8k5 gold badges80 silver badges122 bronze badges

7 Comments

Sam P Over a year ago

This is true, but when we read a single byte and the phrase number is over 255, how are we supposed to know that what we are reading is a two byte number?

jtahlborn Over a year ago

@SamP - i would assume your file format has some sort of specification for multi-byte numbers, if you need to be able to write numbers > 255.

jtahlborn Over a year ago

@SamP - a quick glance at the LZW spec indicates that the numbers are 12bits. you need to read the data as multiple bytes at a time and use bit shifting to get each "number". also, if this question is homework, which i'm assuming it is, you should tag it as such.

Sam P Over a year ago

I was unaware of that necessity for tagging it as homework - I will do as such from now. It is homework but very interesting homework as it is.

Sam P Over a year ago

The numbers in my specific version do not have a set 12 bits - it slides as it goes along, saving on empty space (my current implementation ignores this optimisation, :s)

|

Collectives™ on Stack Overflow

Java IO classes - troubles with file IO

2 Answers 2

5 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related