97

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.

It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?

Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.

If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).

2

9 Answers 9

66

This is really non-trivial.

There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.

The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.

Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.

Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.

Sign up to request clarification or add additional context in comments.

8 Comments

Hmm yeah I was thinking about what would happen if you start digging in code objects... I guess you can escape the exec that way... PyPy is what Google App Engine is using already though, isn't it? I wonder if the pure Python version of PyPy can run in GAE... I'll mess around with it a bit.
I think GAE has a variant of unalden swallow. It's not PyPY AFAIK.
Do you think this code is a good start? code.activestate.com/recipes/496746
Can't give you a total guarantee but a cursory look tells me that it's decent code. One place I know which does this in "production" is the Templetor templating engine used by web.py. You might want to take a look at that.
@Blixt : they always used cpython. The mechanism for 2.5.2 was entirely done in pure python. For 2.7.5, they compiled python for ɴaᴄl‑glibc : a sandbox which runs at the C level.
|
22

Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:

from sys import addaudithook
def block_mischief(event,arg):
    if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') 
            or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')

addaudithook(block_mischief)

So far exec could easily write to disk:

exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))

But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.

WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)",     dict(locals()))

An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.

exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden

Of course, the top-level code can enable file I/O again.

del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))

Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:

  1. perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.

  2. Definitely there is still the possibility of crashing or overloading the Cpython interpreter.

  3. Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html

I would be thankful to anybody pointing me to the flaws of this approach.


EDIT: So this is not safe either! I am very thankful to @Emu for his clever hack using exception catching and introspection:

#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
    if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
        raise IOError('file write forbidden')

addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
    try:
        raise Exception()
    except:
        del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))

I guess that auditing+subprocessing is the way to go, but do not use it on production machines:

https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py

1 Comment

From python docs sys.addaudithook: "Note that audit hooks are primarily for collecting information about internal or otherwise unobservable actions, whether by Python or libraries written in Python. They are not suitable for implementing a “sandbox”. In particular, malicious code can trivially disable or bypass hooks added using this function. At a minimum, any security-sensitive hooks must be added using the C API PySys_AddAuditHook() before initialising the runtime, and any modules allowing arbitrary memory modification (such as ctypes) should be completely removed or closely monitored."
16

I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.

Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.

I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.

And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.

Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.

This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?

3 Comments

The latest release of RestrictedPython is only compatible with Python 2.3, 2.4, 2.5, 2.6, and 2.7. No support for Python 3, yet.
It seems like it does support Python 3.6, 3.7 & 3.8 restrictedpython.readthedocs.io/en/latest/index.html
This solution deserves far more upvotes.
11

Update: This technique does not prevent creating custom code objects. See the comments.


AFAIK it is possible to run a code in a completely isolated environment:

exec somePythonCode in {'__builtins__': {}}, {}

But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.

6 Comments

Hm, interesting. I'll try it out! Since all code is already sandboxed from the system (I'm developing on GAE), I can detect an infinite recursion/heavy memory usage and stop the script from being run again.
that's smart. Is this absolutely safe ?
not exactly, try running exec [ i for i in ().__class__.__base__.__subclasses__() if i.__name__ == 'code'][0](0, 5, 8, 0, 'hello world', (), (), (), '', '', 0, '')
@MichałZieliński, can you explain why this creates a segfault? I understand the part where you create a code object, but not what the arguments mean.
|
4

I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.

Here's and abstract example for a healing potion.

{function_id='healing potion', action='use', target='self', inventory_id='1234'}

The response might be something like

{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}

5 Comments

Yeah, my game already has an RPC API, I just want certain events, when a player is playing, to be more dynamic... So scripting feels like a natural choice :) I guess that worst case scenario is that I'll have to make a simple interpreter myself.
You wouldn't necessarily need to create a complex API. You could do something as simple as serializing a data structure passing it to the RPC server (Running Python), which would load the structure and run the end user code (Python). The end user modifies it and sends it back. Regardless, you are going to have to create guidelines as to how to access your data.
This is in my opinion the best approach, since it is reducing the problem to the app engine's sandbox capibility: At worst, the code can mess up the data in the dummy application that just runs the python code. I don't even think you would need any persistent data for that app.
This is really a non-answer. What does "tightly controlled" mean? You have to choose a sandboxing technology to restrict access on that server.
@Glyph it really depends on the os, it could be a chroot jail. I figured I would leave it to the implementer to figure out what worked for them. I personally would be wary of using any of the offered parsing and compiling solutions due to the high chance you might miss something and leave a big hole. take the issues with rexec and bastion as examples. Considering wiki.python.org/moin/SandboxedPython lists chroot jails as a possibility, I would say that this is a valid answer.
1

Hmm. This is a thought experiment, I don't know of it being done:

You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.

The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.

In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)

1 Comment

Please don't do that. There are many ways of executing arbitrary code without directly using the packages you want to check for. For example, you could walk over the entries of ().__class__.__base__.__subclasses__() and search for the "code" entry, which then can be used to run code from a string. If you take normal Python code and check it for malicious things, you can never be sure that you did not forget to check for something that can be exploited.
1

I think your best bet is going to be a combination of the replies thus far.

You'll want to parse and sanitise the input - removing any import statements for example.

You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

3 Comments

I totally concur. This does seem to be the right way to go. I'm sceptical about how much you can accomplish though.
Hmm, which cases would I need to sanitize the input using Messa's method? I've tried to import modules or otherwise access external values, but it doesn't seem easy. Import statements etc. are already disabled since no built-in functions are available (the import statement calls the __import__ function).
You should try to fish out the thread on Python-dev discussing this. It had everyone break the sandbox. Lots of ways there. I can't find it.
1

You can simply disallow "dunder" access and restrict the builtins and other globals:

if "__" not in code:
  eval(code, {'__builtins__': {}}, {});

All mechanisms for evading sandboxes require dunder access. At this point you can add back in the globals (and even allowed imports) that you want the user to have access to carefully.

For example:

if "__" not in code:
  eval(code, {'__builtins__': {'__import__': my_safe_importer}}, {});

Santitization + restriction should be enough. I've read through a number of blogs and articles on evading sandboxes and 100% of the techniques use dunder access.

Alternatively, this module does the hard work for you (preventing dunder access) with a more nuanced and tested approach:

https://restrictedpython.readthedocs.io/en/latest/

It's well maintained:

from RestrictedPython import compile_restricted

source_code = "1+1"

byte_code = compile_restricted(
    source_code,
    filename='<inline code>',
    mode='eval'
)
eval(byte_code, {'__builtins__': {}}, {})

1 Comment

"All mechanisms for evading sandboxes require dunder access." Disallowing '__' isn't enough. For example, the character '\ufe33' normalizes to '_' under NFKC, so '_\ufe33' can be used in place of '__'. Then, normal escape techniques can be used. Proof of concept: gist.github.com/nickodell/df8a8b42f8026ffcfe824e597a8d4f09
0

I know I'm late to the party, but there is seccomp in Linux.

from seccomp import *
# Only allow the following syscalls.
filter_ = SyscallFilter(KILL) # TODO: Report violations back instead of killing the process.
filter_.add_rule(ALLOW, 'mmap')
filter_.add_rule(ALLOW, 'munmap')
filter_.add_rule(ALLOW, 'select')
filter_.add_rule(ALLOW, 'read')
filter_.add_rule(ALLOW, 'write')
filter_.add_rule(ALLOW, 'close')
filter_.add_rule(ALLOW, 'futex')
filter_.add_rule(ALLOW, 'getrusage')
filter_.add_rule(ALLOW, 'mprotect')
filter_.add_rule(ALLOW, 'rt_sigaction')
filter_.add_rule(ALLOW, 'rt_sigreturn')
filter_.add_rule(ALLOW, 'clock_gettime')
filter_.add_rule(ALLOW, 'madvise')
filter_.add_rule(ALLOW, 'prctl')
filter_.add_rule(ALLOW, 'exit')
filter_.add_rule(ALLOW, 'exit_group')
filter_.load()
# untrusted code follows here...

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.