I'm generating a random 20 bytes long array and want to convert it to a string, in order to use it as a random token for an API call.
However when I convert it back to a byte array (just for testing), I get a different array compared to the original.
Here is my code:
var rand = new Random();
string generateId(){
byte[] bytes_buff = new byte[20];
rand.NextBytes(bytes_buff);
foreach (byte b in bytes_buff)
Console.Write("{0, 5}", b);
Console.WriteLine();
string converted = System.Text.Encoding.UTF8.GetString(bytes_buff);
foreach (char character in converted)
Console.Write("{0, 5}", character);
Console.WriteLine();
byte[] recoded = System.Text.Encoding.UTF8.GetBytes(converted);
foreach (byte b in recoded)
Console.Write("{0, 5}", b);
Console.WriteLine();
return converted;
}
And it produces this output:
162 108 161 7 212 200 169 171 205 89 240 122 194 173 223 253 57 148 125 76
? l ? ? ? ? ? Y ? z - ? ? 9 ? } L
239 191 189 108 239 191 189 7 239 191 189 200 169 239 191 189 239 191 189 89 239 191 189 122 194 173 239 191 189 239 191 189 57 239 191 189 125 76
I've noticed that for larger numbers (bigger than 127) the GetString() converts to the "?" character and GetByte() for "?" converts back to three bytes of 239 191 189.
From this post I've learned that UTF-8 is not one-to-one mapped but then how are we supposed to generate tokens as string and send them across the internet?
isn't UTF-8 the standard encoding on the internet?
Also if we can't convert all 0-255 range for every character in tokens, what is the actual range for those characters (a-z, A-Z, 0-9, etc')?
Any explanation is appreciated. thanks in advance!
UTF8.GetString()will somehow fix it. That's why it's different when you convert ib back to a byte sequence.