1

I intitialise a BufferedReader as such:

Reader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"));

where filename is any given string.

When I process the output through a loop as such:

int k;
while((k = reader.read()) != -1){
        String entry;
        if (dict.containsKey(k))
            entry = dict.get(k);
        else if (k == mapSize)
            entry = w + w.charAt(0);
        else
            throw new IllegalArgumentException("Bad compressed k: " + k);
        this.fos.write(entry);
        result += entry;

        // Add w+entry[0] to the dictionary.
        dict.put(mapSize++, w + entry.charAt(0));

        w = entry;
}

it only reads 65536 number of characters before hitting the EOF. Anyone know what's going on here?

7
  • 3
    Integer.MAX_VALUE is actually 2147483647. Commented Apr 29, 2012 at 14:09
  • Don't use the ready() method, and show us how you actually read your chars. Commented Apr 29, 2012 at 14:10
  • @Jeffrey, yeah, my bad. Max for a 16bit number though :) I changed it from using the reader.ready() method but I am still encountering the same problem. Commented Apr 29, 2012 at 14:18
  • 1
    Probably not related to your problem, but your code seems to treat every UTF-8 character in the file as an arbitrary number. Are you sure you really have UTF-8 characters in this file? Commented Apr 29, 2012 at 14:25
  • 1
    All numbers are not valid UTF-8 characters. Commented Apr 29, 2012 at 15:04

2 Answers 2

2

You don't need to call ready(). Just read the data or lines

String line;
while((line = reader.readLine()) != null) {
    //process, LZW algorithm
}

or

// buffer is redundant if you are reading large blocks.
Reader reader = new InputStreamReader(new FileInputStream(filename), "UTF-8");
char[] buffer = new char[8*1024];
int len;
while((len = reader.read(buffer)) > 0) {
    // process text
}
Sign up to request clarification or add additional context in comments.

5 Comments

Still having the same problem. For example: while((k = reader.read()) != -1) and then directly processing each character read still only outputs 65k characters
Can you print new File(filename).length() ?
System.out.println(new File(filename).length() = 64398
This is compressed data, btw.
the original precompressed file's length = 81362 bytes, the compressed file has 64398 bytes, and the decompressor only reads 25381 characters, extracting 65536 characters
0

You are attempting to read binary data as character data. that's going to go badly. utf8 is a multi-byte character encoding, which means the number of characters you read from the file may not equal the number of bytes in the file. if you are trying to implement a decompression algorithm, you should be using an InputStream and reading bytes, not chars.

7 Comments

This is true, but when we read a single byte and the phrase number is over 255, how are we supposed to know that what we are reading is a two byte number?
@SamP - i would assume your file format has some sort of specification for multi-byte numbers, if you need to be able to write numbers > 255.
@SamP - a quick glance at the LZW spec indicates that the numbers are 12bits. you need to read the data as multiple bytes at a time and use bit shifting to get each "number". also, if this question is homework, which i'm assuming it is, you should tag it as such.
I was unaware of that necessity for tagging it as homework - I will do as such from now. It is homework but very interesting homework as it is.
The numbers in my specific version do not have a set 12 bits - it slides as it goes along, saving on empty space (my current implementation ignores this optimisation, :s)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.