1

I have a file like this:

aarónico
aaronita
ababol
abacá
abacería
abacero
ábaco
#more words, with no ascii chars

When i read and print that file to the console, it prints exactly the same, as expected, but when i do:

f.write(json.dumps({word: Lookup(line)}))

This is saved instead:

{"aar\u00f3nico": ["Stuff"]}

When i expected:

{"aarónico": ["Stuff"]}

I need to get the same when i jason.loads() it, but i don't know where or how to do the encoding or if it's needed to get it to work.

EDIT

This is the code that saves the data to a file:

with open(LEMARIO_FILE, "r") as flemario:
    with open(DATA_FILE, "w") as f:
        while True:
            word = flemario.readline().strip()
            if word == "":
                break
            print word #this is correct
            f.write(json.dumps({word: RAELookup(word)}))
            f.write("\n")

And this one loads the data and returns the dictionary object:

    with open(DATA_FILE, "r") as f:
        while True:
            new = f.readline().strip()
            if new == "":
                break
            print json.loads(new) #this is not

I cannot lookup the dictionaries if the keys are not the same as the saved ones.

EDIT 2

>>> import json
>>> f = open("test", "w")
>>> f.write(json.dumps({"héllö": ["stuff"]}))
>>> f.close()
>>> f = open("test", "r")
>>> print json.loads(f.read())
{u'h\xe9ll\xf6': [u'stuff']}
>>> "héllö" in {u'h\xe9ll\xf6': [u'stuff']}
False
7
  • 1
    You are looking at the JSON encoding of a unicode character. That is normal. This is fully compliant RFC 4627 JSON (see section 2.5 on string values). Commented Mar 3, 2013 at 10:17
  • Json is saving data correctly. Unicode strings are converted to the format you have presented. Commented Mar 3, 2013 at 10:17
  • But when i load it again, it doesen convert back as it was before. Commented Mar 3, 2013 at 10:18
  • @gcq: Are you certain that you are not looking at the Python string literal representation? Commented Mar 3, 2013 at 10:20
  • 1
    >>> print u'h\xe9ll\xf6' gives héllö. You are looking at the python string literal representation. Your code is working. Commented Mar 3, 2013 at 11:00

1 Answer 1

6

This is normal and valid JSON behaviour. The \uxxxx escape is also used by Python, so make sure you don't confuse python literal representations with the contents of the string.

Demo in Python 3.3:

>>> import json
>>> print('aar\u00f3nico')
aarónico
>>> print(json.dumps('aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps('aar\u00f3nico')))
aarónico

In python 2.7:

>>> import json
>>> print u'aar\u00f3nico'
aarónico
>>> print(json.dumps(u'aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps(u'aar\u00f3nico')))
aarónico

When reading and writing from and to files, and when specifying just raw byte strings (and "héllö" is a raw byte string) then you are not dealing with Unicode data. You need to learn about the differences between encoded and Unicode data first. I strongly recommend you read at least 2 of the following 3 articles:

You were lucky with your "héllö" python raw byte string representation, Python managed to decode it automatically for you. The value read back from the file is perfectly normal and correct:

>>> print u'h\xe9ll\xf6'
héllö
Sign up to request clarification or add additional context in comments.

8 Comments

@gcq: You are opening a file without specifying the encoding. Is this Python 2 or 3, do you know what encoding is used for the files?
@gcq: You need to give us much more detail as to what you see when you do this (use repr() on the input and output, give us a Python session like in my answer for example, so we can help you debug).
@Martjin is python 2 and no, i don't know the encoding used in this file, but never had problems before writing and reading non ascii chars from files.
@gcq: Which is what I'd ask you to do anyway. :-) You can read bytes from files just fine, you just will not get \uxxxx unicode character escapes that way.
@Martijin then what can i do to retrieve the json from the file? I cannot use json.dump() and .load() becouse in the file every line is one json object. To do that i would need to load about 88000 dictionaries in memory, and i don't want to do that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.