6

I have a pretty interesting topic - at least for me. Given a ByteArrayOutputStream with bytes for example in UTF-8, I need a function that can "translate" those bytes into another - new - ByteArrayOutputStream in for example UTF-16, or ASCII or you name it. My naive approach would have been to use a an InputStreamReader and give in the the desired encoding, but that didn't work because that'll read into a char[] and I can only write byte[] to the new BAOS.

public byte[] convertStream(Charset encoding) {
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);
    ByteArrayOutputStream converted = new ByteArrayOutputStream();

    int readCount;
    char[] buffer = new char[4096];
    while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1)
        converted.write(buffer, 0, readCount);

    return converted.toByteArray();
}

Now, this obviously doesn't work and I'm looking for a way to make this scenario possible, without building a String out of the byte[].

@Edit: Since it seems rather hard to read the obvious things. 1) raw: ByteArrayOutputStream containing bytes of a BINARY object sent to us from clients. The bytes usually come in UTF-8 as a part of a HTTP Message. 2) The goal here is to send this BINARY data forward to an internal System that's not flexible - well this is an internal System - and it accepts such attachments in UTF-16. I don't know why don't even ask, it does so.

So to justify my question: Is there a way to convert a byte array from Charset A to Charset B or encoding of your choise. Once again Building a String is NOT what I'm after.

Thank you and hope that clears up questionable parts :).

9
  • What is raw? You've only given us part of the information. I'd expect to just convert the bytes to a string, and then convert back from a string to a byte array. No need to use streams at all. Commented Dec 22, 2015 at 10:32
  • Well, raw is obviously a ByteArrayOutputStream containing the bytes in whatever encoding that was used by our client of a binary data. We have to transfer this data to our System in utf-8 formát so we need to convert the whatever to utf-8 or whatever. I hope that clears it up. Building a string is out of question right now. Commented Dec 22, 2015 at 10:35
  • 2
    Why is building a string out of the question? If the most obvious approach is inappropriate, you need to explain why that's the case. And the benefit of a short but complete example is that what you consider "obvious" is spelled out in the code. Far too often I've made assumptions that seem "obvious" to me, but turn out not to be... and when you're now adding restrictions as to what is feasible and what isn't, that adds to the confusion. Commented Dec 22, 2015 at 10:38
  • if it concerns charset, see this: stackoverflow.com/questions/229015/encoding-conversion-in-java Commented Dec 22, 2015 at 10:41
  • 2
    But the answer building a string up does answer your original question. There was nothing in that original question to explain why you wouldn't want to do that. You still haven't said why you refuse to create a string. And being rude to people trying to help you is a really, really bad idea. Commented Dec 22, 2015 at 10:47

1 Answer 1

17

As mentioned in comments, I'd just convert to a string:

String text = new String(raw.toByteArray(), encoding);
byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);

However, if that's not feasible (for some unspecified reason...) what you've got now is nearly there - you just need to add an OutputStreamWriter into the mix:

// Nothing here should throw IOException in reality - work out what you want to do.
public byte[] convertStream(Charset encoding) throws IOException {       
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);

    int readCount;
    char[] buffer = new char[4096];
    try (ByteArrayOutputStream converted = new ByteArrayOutputStream()) {
        try (Writer writer = new OutputStreamWriter(converted, StandardCharsets.UTF_8)) {
            while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1) {
                writer.write(buffer, 0, readCount);
            }
        }
        return converted.toByteArray();
    }
}

Note that you're still creating an extra temporary copy of the data in memory, admittedly in UTF-8 rather than UTF-16... but fundamentally this is hardly any more efficient than creating a string.

If memory efficiency is a particular concern, you could perform multiple passes in order to work out how many bytes will be required, create a byte array of the write length, and then adjust the code to write straight into that byte array.

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect OutputStreamWriter was the answer! That would have been enough for me!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.