0

I'm implementing a bencoding system for a torrent downloading system I'm making.

Bencoding a string is very easy, you take a string, for instance "hello", and you encode it by writing the string length + a ':' character, followed by the string itself. Bencoded "hello" will be "5:hello"

Currently I'm having this code.

public BencodeString(String string) {
    this.string = string;
}

public static BencodeString parseBencodeString(String string) {
    byte[] bytes = string.getBytes();
    int position = 0;
    int size = 0;
    StringBuilder sb = new StringBuilder();
    while (bytes[position] >= '0' && bytes[position] <= '9') {
        sb.append((char) bytes[position]);
        position++;
    }
    if (bytes[position] != ':')
        return null;

    size = Integer.parseInt(sb.toString());
    System.out.println(size);
    if (size <= 0)
        return null;
    return new BencodeString(string.substring(position + 1, size + position
            + 1));
}

It works, but I have the feeling that it could be done ways better. What is the best way to do this?

Note: the string could be any size (thus more than one digit before the string)

Solved already, thanks to everybody that replied here :)

4
  • This sounds like a question that would be better answered on codereview.stackexchange.com Commented Sep 15, 2012 at 23:16
  • You could parse the length manually while you're scanning for the end of the length, that would avoid building a new string that you're only going to parse anyway Commented Sep 15, 2012 at 23:17
  • What you need is the String.indexOf function to find the : Commented Sep 15, 2012 at 23:18
  • @Edc String$indexOf will do, thank you Commented Sep 15, 2012 at 23:23

4 Answers 4

2

Java strings represent Unicode character sequences. You shouldn't use them for bencoding/decoding since you'll run into encoding issues that way.

For example the dictionary keys must be sorted by their binary representation, not by string sorting. Bencoded data has no inherent character set as values can contain raw binary (such as hashes) or utf-8 encoded strings (utf places constraints on which byte sequences are valid).

You should work with ByteBuffers or plain byte[] arrays instead.

Sign up to request clarification or add additional context in comments.

Comments

0

string.substring(string.indexOf(':'))

That's all you need to do. You already know the size of the data since you have it all in a String.

2 Comments

Can't believe I didn't thought of that haha, thanks good sir.
so he doesn't even need the ":" and the integer before . he simply can put the string itself .
0

use string.split(":") in order to get the 2 sides of the string ...

then parse both however you wish .

2 Comments

Is there any way to split only from the first occurence of ':' with regex? (The string could be an url, which could holds them)
Yes, use the other form of split() which takes a limit parameter: docs.oracle.com/javase/1.4.2/docs/api/java/lang/…
0

How about something like:

public static BencodeString parseBencodeString(String string)
{
    int colon = string.indexOf(":");
    if(colon >= 0)
    {
        return new BencodeString(string.substring(colon+1));
    }
    return null;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.