2

I want to read python dictionary string using java. Example string:

{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}

This is not a valid JSON. I want it to convert into proper JSON using java code.

11
  • 5
    Interesting assignment. And what is your question? And I agree with the following comment: why spend energy to parse a non-standard format; instead of making sure you emit JSON on the python side instead?! Commented Apr 26, 2017 at 12:33
  • As this is not a proper JSON so I am not able to load it in JAVA. Basically I am using SCALA and json4s library. Commented Apr 26, 2017 at 12:34
  • @GhostCat It is not possible in my case. These strings are saved in DB Commented Apr 26, 2017 at 12:34
  • 1
    @Devarata then convert them to JSON as they get into the database. Saving non standard formats into a db spells trouble Commented Apr 26, 2017 at 12:36
  • 2
    Perhaps you should use Jython to allow you to pass values to a python interpreter within Java and let it return that JSON to you. Commented Apr 26, 2017 at 12:45

3 Answers 3

5

well, the best way would be to pass it through a python script that reads that data and outputs valid json:

>>> json.dumps(ast.literal_eval("{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"))
'{"name": "Shivam", "otherInfo": [[0], [1]], "isMale": true}'

so you could create a script that only contains:

import json, ast; print(json.dumps(ast.literal_eval(sys.argv[1])))

then you can make it a python oneliner like so:

python -c "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))" "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"

that you can run from your shell, meaning you can run it from within java the same way:

String PythonData = "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}";

String[] cmd = {
    "python", "-c", "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))",
    python_data
    };
Runtime.getRuntime().exec(cmd);

and as output you'll have a proper JSON string.

This solution is the most reliable way I can think of, as it's going to parse safely any python syntax without issue (as it's using the python parser to do so), without opening a window for code injection.

But I wouldn't recommend using it, because you'd be spawning a python process for each string you parse, which would be a performance killer.

As an improvement on top of that first answer, you could use some jython to run that python code in the JVM for a bit more performance.

PythonInterpreter interpreter = new PythonInterpreter();
interpreter.eval("to_json = lambda d: json.dumps(ast.literal_eval(d))")
PyObject ToJson = interpreter.get("to_json");
PyObject result = ToJson.__call__(new PyString(PythonData));
String realResult = (String) result.__tojava__(String.class);

The above is untested (so it's likely to fail and spawn dragons 👹) and I'm pretty sure you can make it more elegant. It's loosely adapted from this answer. I'll leave up to you as an exercise to see how you can include the jython environment in your Java runtime ☺.


P.S.: Another solution would be to try and fix every pattern you can think of using a gigantic regexp or multiple ones. But even if on simpler cases that might work, I would advise against that, because regex is the wrong tool for the job, as it won't be expressive enough and you'll never be comprehensive. It's only a good way to plant a seed for a bug that'll kill you at some point in the future.


P.S.2: Whenever you need to parse code from an external source, always make sure that data is sanitized and safe. Never forget about little bobby tables

Sign up to request clarification or add additional context in comments.

10 Comments

This makes a lot of sense actually
Nice and straight forward solution ... and I think together with my suggestions, it becomes even more interesting. Any feedback is welcome ...
Though I would be cautious about taking data from a database and shoving it into an exec...
@cruncher I've been thinking of way to circumvent it, but because that code does an exec of python, the code ran is exactly the one liner as being written above, and the variable is passed as an argv argument to the literal_eval function, this code is pretty safe against usual exploits.
well, it's a python interpreter that runs in the JVM. So the upside is that you can reuse the interpreter instance and avoid the cost of spawning it for each of the 10M strings. The downside is that it's still a huge overhead. You'd better connect to the database using python, create a new field next to the one you've got, and for each python dict row build the JSON and store it in the json row. If that's an operation you only run once over the whole database, it'll be the most efficient.
|
1

In conjunction to the other answer: it is straight forward to simply invoke that python one-liner statement to "translate" a python-dict-string into a standard JSON string.

But doing a new Process for each row in your database might turn into a performance killer quickly.

Thus there are two options that you should consider on top of that:

  • establish some small "python server" that keeps running; its only job is to do that translation for JVMs that can connect to it
  • you can look into jython. Meaning: simply enable your JVM to run python code. In other words: instead of writing your own python-dict-string parser; you simply add "python powers" to your JVM; and rely on existing components to that translation for you.

Comments

0

Hacky solution

Do a string replace ' -> ", True -> true, False -> false, and None -> null, then parse the result as Json. If you are lucky (and are willing to bet on remaining lucky in the future), this can actually work in practice.

See rh-messaging/cli-rhea/blob/main/lib/formatter.js#L240-L249 (in Javascript)

static replaceWithPythonType(strMessage) {
    return strMessage.replace(/null/g, 'None').replace(/true/g, 'True').replace(/false/g, 'False').replace(/undefined/g, 'None').replace(/\{\}/g, 'None');
}

Skylark solution

Skylark is a subset (data-definition) language based on Python. There are parsers in Go, Java, Rust, C, and Lua listed on the project's page. The problem is that the Java artifacts aren't published anywhere, as discussed in Q: How do I include a Skylark configuration parser in my application?

Graal Python

Possibly this, https://github.com/oracle/graalpython/issues/96#issuecomment-1662566214

DIY Parsers

I was not able to find a parser specific to the Python literal notation. The ANTLR samples contain a Python grammar that could plausibly be cut down to work for you https://github.com/antlr/grammars-v4/tree/master/python/python3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.