1

I have a method that turns a byte array into an integer

public int Encode(string input)
{
    var bytes = Encoding.Unicode.GetBytes(input.ToLowerInvariant());
    return BitConverter.ToInt64(bytes,0);
}

Why is this integer not different for any input string?

For example

input = "http://www.google.com => 31525695615402088

and

input = "http://www.microsoft.com => 31525695615402088

2
  • 7
    because the first 8 bytes are the same? Commented Dec 29, 2011 at 22:50
  • 3
    What exactly are you trying to accomplish? Commented Dec 29, 2011 at 22:54

6 Answers 6

6

Because 64 bits is 8 bytes, and so ToInt64 consumes only the first 8 bytes of the input array. What are the first eight bytes of the strings you've used?

And, as alexm notes, Encoding.Unicode specifies UTF-16, in which each character is actually two bytes (usually), so only the first 4 characters count.

Sign up to request clarification or add additional context in comments.

2 Comments

Also it is worth to mention that an unicode char takes two bytes.
It's also worth noting that the first four characters are "http"!
3
'h' == 0x68
't' == 0x74
'p' == 0x70

Little endian, two-byte characters, so "http" gives you an array that starts with :

{ 0x68, 0x00, 0x74, 0x00, 0x74, 0x00, 0x70, 0x00 ...

Interpret this as a little-endian 32-bit integer, and you get:

0x0070007400740068

Which, of course is equal to 31525695615402088

Comments

2

An int64 is 8 bytes. I'm sure you can figure it out from there.

Comments

2

This occurs because a 64-bit integer uses 8-bytes of memory, and BitConverter will only convert using the first 8 bytes of the byte array you specified, starting from position 0. Each sample input you provided starts with the same 8 bytes.

For what it's worth, it's not possible to perform loss-less encoding of a string of variable length into an integer data type with a size of 4 - 8 bytes. You may be looking for a hashing algorithm that represents your data in a finite number of bytes.

Comments

0

Well, ToInt64 uses 8 bytes - that's 4 unicode characters.

Comments

0

Because BitConverter.ToInt64 takes only the first 8 bytes of your byte array which are the same for your strings. Try input strings "google.com" and "yahoo.com".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.