1

I want a clarification about SHIFT-JIS characters set. Is ASCII a subset of SHIFT-JIS character set similar to UTF-8. If a file has mix of SHIFT-JIS and ASCII. how can we read the same using QT codecs?

2 Answers 2

2

Is ASCII a subset of SHIFT-JIS character set similar to UTF-8

No: the backslash (0x5C) is missing from SHIFT-JIS and being replaced by a Yen currency symbol.

If a file has mix of SHIFT-JIS and ASCII. how can we read the same using QT codecs.?

By using QTextCodec do properly decode the various pieces; however, detecting how each part is encoded is up to you...

Sign up to request clarification or add additional context in comments.

1 Comment

QTextcodec("SHIFT-JIS") and pass a file which has both ASCII and SHIFT-JIS does it read properly. I even it tested the same and it seems to read all the ASCII characters but i am not able to verify if Japanese characters are read properly.
0

At least according to wikipedia there are multiple variants of shift-jis.

The original shift-jis was based on JIS X 0201 which is almost but not quite an extension of ASCII. Two codes differed, 0x5C is a backslash in ASCII but a yen sign in the original shift-jis. 0x7E was a vertical bar (aka "pipe") in ascii, but an overline in shift-jis.

However, "code page 932" the windows variant of shift-jis maps the ASCII range to the ASCII unicode code points. The HTML5 spec follows the same procedure as windows. In turn many Japanese fonts will have the yen and overline at positions 0x5C and 0x7C.

You may need to experiment to find the particular behaviour of whatever encoding/decoding library you use and to decide if that behaviour is appropriate for your application.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.