How can I sandbox Python in pure Python?

Question

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.

It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?

Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.

If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).

Copied from link only answer: How can I run an untrusted Python script safely (i.e. Sandbox) — Trenton McKinney
– Trenton McKinney, Commented Jun 9, 2022 at 17:22
Good answer in a related thread. Related thread on SWE: Best practices for execution of untrusted code — ggorlen
– ggorlen, Commented Jan 8, 2024 at 2:00

Aaron Digulla · Accepted Answer · 2013-11-04 14:40:32Z

66

This is really non-trivial.

There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.

The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.

Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.

Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.

edited Nov 4, 2013 at 14:40

Aaron Digulla

330k111 gold badges626 silver badges840 bronze badges

answered Jun 18, 2010 at 9:21

Noufal Ibrahim

73.2k13 gold badges140 silver badges174 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Blixt Over a year ago

Hmm yeah I was thinking about what would happen if you start digging in code objects... I guess you can escape the exec that way... PyPy is what Google App Engine is using already though, isn't it? I wonder if the pure Python version of PyPy can run in GAE... I'll mess around with it a bit.

Noufal Ibrahim Over a year ago

I think GAE has a variant of unalden swallow. It's not PyPY AFAIK.

Blixt Over a year ago

Do you think this code is a good start? code.activestate.com/recipes/496746

Noufal Ibrahim Over a year ago

Can't give you a total guarantee but a cursory look tells me that it's decent code. One place I know which does this in "production" is the Templetor templating engine used by web.py. You might want to take a look at that.

user2284570 Over a year ago

@Blixt : they always used cpython. The mechanism for 2.5.2 was entirely done in pure python. For 2.7.5, they compiled python for ɴaᴄl‑glibc : a sandbox which runs at the C level.

|

5 revs · Accepted Answer · 2020-07-02 15:37:19Z

Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:

from sys import addaudithook
def block_mischief(event,arg):
    if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') 
            or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')

addaudithook(block_mischief)

So far exec could easily write to disk:

exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))

But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.

WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)",     dict(locals()))

An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.

exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))

...
OSError: file write forbidden

Of course, the top-level code can enable file I/O again.

del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))

Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:

perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html

I would be thankful to anybody pointing me to the flaws of this approach.

EDIT: So this is not safe either! I am very thankful to @Emu for his clever hack using exception catching and introspection:

#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
    if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
        raise IOError('file write forbidden')

addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
    try:
        raise Exception()
    except:
        del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))

I guess that auditing+subprocessing is the way to go, but do not use it on production machines:

https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py

From python docs sys.addaudithook: "Note that audit hooks are primarily for collecting information about internal or otherwise unobservable actions, whether by Python or libraries written in Python. They are not suitable for implementing a “sandbox”. In particular, malicious code can trivially disable or bypass hooks added using this function. At a minimum, any security-sensitive hooks must be added using the C API PySys_AddAuditHook() before initialising the runtime, and any modules allowing arbitrary memory modification (such as ctypes) should be completely removed or closely monitored."

Aaron Hall · Accepted Answer · 2017-11-02 14:50:05Z

16

I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.

Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.

I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.

And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.

Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.

_{This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?}

edited Nov 2, 2017 at 14:50

Aaron Hall♦

400k93 gold badges415 silver badges342 bronze badges

answered Jul 6, 2012 at 9:34

Sergey

12.5k4 gold badges43 silver badges54 bronze badges

3 Comments

colidyre Over a year ago

The latest release of RestrictedPython is only compatible with Python 2.3, 2.4, 2.5, 2.6, and 2.7. No support for Python 3, yet.

Guy Korland Over a year ago

It seems like it does support Python 3.6, 3.7 & 3.8 restrictedpython.readthedocs.io/en/latest/index.html

fbmd Over a year ago

This solution deserves far more upvotes.

Messa · Accepted Answer · 2024-04-24 10:58:31Z

11

Update: This technique does not prevent creating custom code objects. See the comments.

AFAIK it is possible to run a code in a completely isolated environment:

exec somePythonCode in {'__builtins__': {}}, {}

But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.

edited Apr 24, 2024 at 10:58

answered Jun 18, 2010 at 8:48

Messa

25.4k10 gold badges77 silver badges101 bronze badges

6 Comments

Blixt Over a year ago

Hm, interesting. I'll try it out! Since all code is already sandboxed from the system (I'm developing on GAE), I can detect an infinite recursion/heavy memory usage and stop the script from being run again.

Ali Over a year ago

that's smart. Is this absolutely safe ?

Michał Zieliński Over a year ago

not exactly, try running

exec [ i for i in ().__class__.__base__.__subclasses__() if i.__name__ == 'code'][0](0, 5, 8, 0, 'hello world', (), (), (), '', '', 0, '')

Christian Oudard Over a year ago

@MichałZieliński, can you explain why this creates a segfault? I understand the part where you create a code object, but not what the arguments mean.

Hernan Over a year ago

@ChristianOudard Take a look at nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

|

Philip Tinney · Accepted Answer · 2010-06-18 09:30:41Z

4

I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.

Here's and abstract example for a healing potion.

{function_id='healing potion', action='use', target='self', inventory_id='1234'}

The response might be something like

{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}

answered Jun 18, 2010 at 9:30

Philip Tinney

2,01618 silver badges19 bronze badges

5 Comments

Blixt Over a year ago

Yeah, my game already has an RPC API, I just want certain events, when a player is playing, to be more dynamic... So scripting feels like a natural choice :) I guess that worst case scenario is that I'll have to make a simple interpreter myself.

Philip Tinney Over a year ago

You wouldn't necessarily need to create a complex API. You could do something as simple as serializing a data structure passing it to the RPC server (Running Python), which would load the structure and run the end user code (Python). The end user modifies it and sends it back. Regardless, you are going to have to create guidelines as to how to access your data.

Ali Over a year ago

This is in my opinion the best approach, since it is reducing the problem to the app engine's sandbox capibility: At worst, the code can mess up the data in the dummy application that just runs the python code. I don't even think you would need any persistent data for that app.

Glyph Over a year ago

This is really a non-answer. What does "tightly controlled" mean? You have to choose a sandboxing technology to restrict access on that server.

Philip Tinney Over a year ago

@Glyph it really depends on the os, it could be a chroot jail. I figured I would leave it to the implementer to figure out what worked for them. I personally would be wary of using any of the offered parsing and compiling solutions due to the high chance you might miss something and leave a big hole. take the issues with rexec and bastion as examples. Considering wiki.python.org/moin/SandboxedPython lists chroot jails as a possibility, I would say that this is a valid answer.

Will · Accepted Answer · 2010-06-18 08:35:29Z

1

Hmm. This is a thought experiment, I don't know of it being done:

You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.

The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.

In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)

answered Jun 18, 2010 at 8:35

Will

76k43 gold badges177 silver badges256 bronze badges

1 Comment

Lukas Boersma Over a year ago

Please don't do that. There are many ways of executing arbitrary code without directly using the packages you want to check for. For example, you could walk over the entries of ().__class__.__base__.__subclasses__() and search for the "code" entry, which then can be used to run code from a string. If you take normal Python code and check it for malicious things, you can never be sure that you did not forget to check for something that can be exploited.

Glenjamin · Accepted Answer · 2010-06-18 08:54:57Z

1

I think your best bet is going to be a combination of the replies thus far.

You'll want to parse and sanitise the input - removing any import statements for example.

You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

answered Jun 18, 2010 at 8:54

Glenjamin

7,3906 gold badges28 silver badges26 bronze badges

3 Comments

Noufal Ibrahim Over a year ago

I totally concur. This does seem to be the right way to go. I'm sceptical about how much you can accomplish though.

Blixt Over a year ago

Hmm, which cases would I need to sanitize the input using Messa's method? I've tried to import modules or otherwise access external values, but it doesn't seem easy. Import statements etc. are already disabled since no built-in functions are available (the import statement calls the __import__ function).

Noufal Ibrahim Over a year ago

You should try to fish out the thread on Python-dev discussing this. It had everyone break the sandbox. Lots of ways there. I can't find it.

Erik Aronesty · Accepted Answer · 2024-06-28 21:16:41Z

1

You can simply disallow "dunder" access and restrict the builtins and other globals:

if "__" not in code:
  eval(code, {'__builtins__': {}}, {});

All mechanisms for evading sandboxes require dunder access. At this point you can add back in the globals (and even allowed imports) that you want the user to have access to carefully.

For example:

if "__" not in code:
  eval(code, {'__builtins__': {'__import__': my_safe_importer}}, {});

Santitization + restriction should be enough. I've read through a number of blogs and articles on evading sandboxes and 100% of the techniques use dunder access.

Alternatively, this module does the hard work for you (preventing dunder access) with a more nuanced and tested approach:

https://restrictedpython.readthedocs.io/en/latest/

It's well maintained:

from RestrictedPython import compile_restricted

source_code = "1+1"

byte_code = compile_restricted(
    source_code,
    filename='<inline code>',
    mode='eval'
)
eval(byte_code, {'__builtins__': {}}, {})

edited Jun 28, 2024 at 21:16

answered Jun 28, 2024 at 20:58

Erik Aronesty

13.2k6 gold badges73 silver badges47 bronze badges

1 Comment

Nick ODell Dec 27, 2024 at 0:36

"All mechanisms for evading sandboxes require dunder access." Disallowing '__' isn't enough. For example, the character '\ufe33' normalizes to '_' under NFKC, so '_\ufe33' can be used in place of '__'. Then, normal escape techniques can be used. Proof of concept: gist.github.com/nickodell/df8a8b42f8026ffcfe824e597a8d4f09

haael · Accepted Answer · 2025-01-16 01:11:59Z

I know I'm late to the party, but there is seccomp in Linux.

from seccomp import *
# Only allow the following syscalls.
filter_ = SyscallFilter(KILL) # TODO: Report violations back instead of killing the process.
filter_.add_rule(ALLOW, 'mmap')
filter_.add_rule(ALLOW, 'munmap')
filter_.add_rule(ALLOW, 'select')
filter_.add_rule(ALLOW, 'read')
filter_.add_rule(ALLOW, 'write')
filter_.add_rule(ALLOW, 'close')
filter_.add_rule(ALLOW, 'futex')
filter_.add_rule(ALLOW, 'getrusage')
filter_.add_rule(ALLOW, 'mprotect')
filter_.add_rule(ALLOW, 'rt_sigaction')
filter_.add_rule(ALLOW, 'rt_sigreturn')
filter_.add_rule(ALLOW, 'clock_gettime')
filter_.add_rule(ALLOW, 'madvise')
filter_.add_rule(ALLOW, 'prctl')
filter_.add_rule(ALLOW, 'exit')
filter_.add_rule(ALLOW, 'exit_group')
filter_.load()
# untrusted code follows here...

Collectives™ on Stack Overflow

How can I sandbox Python in pure Python?

9 Answers 9

8 Comments

1 Comment

3 Comments

6 Comments

5 Comments

1 Comment

3 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

8 Comments

1 Comment

3 Comments

6 Comments

5 Comments

1 Comment

3 Comments

1 Comment

Comments

Linked

Related