2

I have a String like this

String str = "\u0e04\u0e38\u0e13\u0e23\u0e39\u0e49\u0e21\u0e31\u0e49\u0e22\u0e44\u0e14\u0e42\u0e19";

It actually looks like ช1: คุณรู้มั้ยไดโนเสาร์ตั

What I want is to keep the string as a string format so that str.charAt(3) is 'e' rather than a strange character.

How to do this? Help

Further explain: I get this string from a file. I read a line in the file to a string, and this line appears to be "\u0e04\u0e38\u0e13\u0e23\u0e39\u0e49\u0e21\u0e31\u0e49\u0e22\u0e44\u0e14\u0e42\u0e19". So in memory, this string is like this.

Code here:

FileReader fr = new FileReader("sample2.json");
BufferedReader br = new BufferedReader(fr);

String line;
while((line = br.readLine()) != null)
{
    JSONObject data = new JSONObject(line);
        String text = data.getString("text");

This line in the file is "\u0e04\u0e38\u0e13\u0e23\u0e39\u0e49\u0e21\u0e31\u0e49\u0e22\u0e44\u0e14\u0e42\u0e19"

Now I want to keep the string text as its original format.

3
  • You posted the actual rendered text instead of a screenshot, which is going to be tricky here. Can you confirm that the font being used has the actual correct glyph you're wanting to display? Commented Nov 22, 2013 at 22:28
  • charAt(3) as index starts with 0. Commented Nov 22, 2013 at 22:32
  • What is the actual input and what is the desired output? Imagine File -> JSON -> HTML. Commented Nov 23, 2013 at 22:37

3 Answers 3

5

You just need to escape every backslash:

String str = "\\u0e04\\u0e38...";
Sign up to request clarification or add additional context in comments.

2 Comments

Standard answer, nonstandard question. (I didn't downvote, but you answered "how to keep Java from treating these as Unicode escapes", which is what OP actually wants in this case.)
upvoted. This is the most elegant solution for now, and it works for the op's requirement str.charAt(4) is 'e'. The double slash occupies one char slot
1

I guess you've read this string from a file or stream. Seems you've read it using the wrong encoding (not the one the String was encoded with when it was written to that file/stream). That's why you get this issue, I think.

We don't worry about encodings when Strings are in memory (in the memory of the JVM for example). Encodings start to matter when you need to write your in-memory data/String to file/stream or to read it from file/stream.

3 Comments

Thank you peter. Yes I read this string from a file to do some pre-processing. Now I want to write this string to a file. What can I do to keep the string as its original format?
Well, you just need to find out the encoding of the file (is it UTF-8, is it UTF-16, is it Windows-1252, etc.). Whoever created the file usually defines the encoding. Then once you know this, you just have to specify that same encoding explicitly in the Java code which you have and which reads the String from this file in the memory of your JVM.
See also the answer of JB Nizet. Seems more concrete than mine and seems useful.
0

Okay, this looks dumb, but it will work in your case:

Instead of:

JSONObject data = new JSONObject(line);

JSONObject data = new JSONObject(line.replaceAll("\\\\", "\\\\\\\\"));

The problem is that JSON converts your unicode chars for your 'convenience'.

2 Comments

JSON still recognize it as UTF-8
odd.. I'm wondering what I'm doing differently than you. What do you get when you: System.out.println(line); ? Also, what is the exact content of your file (perhaps it is online somewhere?)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.