1

I have the following code, converted to C# from an old VB6 program. The VB6 had used the old Winsock, which could accept a String argument, but the C# program uses System.Net.Socket which wants a byte array.

byte[] msg = Encoding.UTF8.GetBytes(tempString); 
_TCPConn.Send(msg);

tempString has

0x0002 (' ')
0x0000 ('\0')
0x0000 ('\0')
0x0000 ('\0')
0x0080 (' ')
0x006d ('m')
0x0068 ('h') 

But msg gets an extra byte

0x02 
0x00
0x00
0x00
**0xc2**
0x80
0x6d
0x68

Where is that "c2" coming from?

2
  • What is the receiver expecting? An ANSI string or Unicode string? Commented Jan 7, 2013 at 18:48
  • VB6 allowed storing bytes in a string but those days are over. In particular Unicode normalization can randomly destroy the content, before you even get to converting it back to bytes. You'll need to fix this problem at the core and stop using a string. Commented Jan 7, 2013 at 18:55

2 Answers 2

2

That is what UTF8 does. Values from 0x80 to 0x7FF get encoded with 2 bytes. Values from 0x800 to 0xFFFF get encoded with 3 bytes. 0xC2 0x80 tells the decoder to output just 0x80.

Edit: If the receiver is only expecting the low byte of each character and character values 0x80-0xFF are valid, you will have to convert each character one at a time.

int len = tempString.Length;
byte[] msg = new byte[len];
for ( int idx = 0; idx < len; ++idx )
{
  msg[idx] = (byte) tempString[idx];
}
Sign up to request clarification or add additional context in comments.

4 Comments

I've used Encoding.UTF8.GetBytes(" \0\0\0 mh") and printed 7 bytes without 0xc2. Am I missing something?
Yikes! So how can I just convert my string to a byte array?
Is every character in tempString guaranteed to be 0x0000 - 0x00FF?
Yes, guaranteed - the device it's talking to is a piece of industrial machinery also manufactured by this company that assumes strictly 8-bit ASCII characters. I've discussed with management the long term consequences of assuming 8-bit characters versus the amount of work it would take to rip out all the old VB6 logic and make something more up to date, and they are satisfied to stay 8-bit for this part of the product.
0

This is done by UTF8 encoding itself. This is fine.

Then you can use UTF8.GetString(Byte[]) method to decode it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.