Sandbox (for) Untrusted Procedure Partitioning (in) Lua Engine - Supple ======================================================================= # Requirements (mostly via Gitano) * Be able to run complex hook scripts in a totally restricted sandbox. * Be able to access appropriately filtered objects/functions/data from the caller in some manner * Have the caller able to call back into the sandbox * Have the sandbox be in a subprocess which can be ulimit()ed etc. * Perhaps have some way to ensure the subprocess cannot access any of the filesystem. (chrooted?) * Allow the caller to inject functions, values, etc into the remote sandbox before execution. Perhaps even modules etc. * Be a minimal overhead system but still be 100% policy controllable at the caller side of the sandbox * May as well use Luxio for the low level operations. * Provide the code to execute over the sandbox connection after the sandbox has locked itself down. # Possible implementation details * Serialise requests and responses as simple packets which can be punted across a single FD pair * Perhaps use userdata to ensure that there's no chance of rawset() etc happening on either end for 'remote' objects. * Cannot use coroutines since pure lua 5.1 cannot yield across metamethod boundaries :-( As such, pcall()s everydamnedwhere :-) * Types of packets passing across the link * Procedure call * Procedure reply ## Context objects Contextual objects are anything non-integral (i.e. a table or userdata) which is passed across the connections. They get transformed into contextual objects which are always represented on the remote end as a userdata which has appropriate metamethods to cope with the object on the far end. Only metamethods which are defined by the sending end (and always __index and __newindex if it's a table) are defined on the receiving end in order to increase realism. The *sandboxed* end of the connection has pairs() and ipairs() augmented to cope with contextual objects, the non-sandboxed end has no such help. Contextual objects also represent remote functions (although they always tostring() to "function" rather than "table" or "userdata") and only support __call. ## Notification of contextual object Contextual objects are always transferred as tables with the following keys: * type: "table" "userdata" "function" * tag: A tag for the caller to use when referring to this object in future * methods: Optional table containing the set of metamethods defined on this object. Note that the remote end will always augment the methods list with __gc so that the notification of GC can be made. Also if the type is "table" then __index and __newindex metamethods will be added for those normal table operations. If the type is "function" then __call will also be forcibly added. At either end, the set of contextual objects is held in tables. * my_objects -- objects sent from this side to the other which have not been garbage collected on the remote end. Strong key (object) strong value (tag) * their_objects -- objects from the remote side strong key (tag) weak value (object) Also, the objects created as part of a function call are held strongly by the receiver of the procedure call so that they *cannot* go out of scope until the procedure call has sent its return value to the caller. ## Procedure call Procedure calls are quite simple, they consist of a contextual object and a procedure name and a set of arguments. The procedure names are always of the form of metamethod names. The __gc metamethod tells the remote end that the local end is 'forgetting' the contextual object and the remote end can drop it from its cache. Thusly a procedure call (unserialised) looks approximately like this: { object = "some object tag", method = "__something", args = { n = 4, -- The number of arguments. Trailing nils might be lost. "string arg", 1234, -- number argument false, -- boolean argument { -- Object argument of some kind type = "table", tag = "tag name", methods = { "__call" } } }, } If the call is passing a contextual object from the remote end back again then it omits the type and methods values, only passing the tag. Thusly whenever a contextual object is passed as an argument, it is automatically unwrapped at the remote end into the local object. This means that we do not end up with multiple layers of contextual objects going back and forth. ## Procedure returns Procedure returns are always of the following form: { error = true, message = "some message", traceback = "some traceback" } or { error = false, results = { n = N, -- The number of results detected -- Any returns as a numerically indexed table. -- Note that trailing nils may be collapsed. } } If the procedure return is an error form then the caller then stashes the traceback in the right place and raises a Lua error on the caller's side. This process might bounce back and forth until something catches the error in full. Otherwise, the results are returned exactly as arguments were provided, i.e. strings, booleans and numbers are passed directly, and anything else is an object. If a brand new object is returned as part of the call then the lifetime of that object will very much be limited to the caller remembering to anchor it somewhere. Otherwise it's quite possible that during the return from the procedure call, a __gc call will be automatically invoked to clean up. # Serialising requests and replies All requests and replies are structured as Lua tables once deserialised. The serialised form of the messages is simply a string which can be loaded and run to return the values. The limited number of formats mean that the serialisers can be written explicitly and the deserialiser can simply be a generic loader followed by a series of asserts and measures to ensure nothing malicious gets injected. # New terminology * Host -- The program which wants to run untrusted code * Sandbox -- The subprocess which is going to run the code on the Host's behalf # What happens when a host wants to run untrusted code? 1. The host starts by preparing a socketpair and forking. 2. The forked process dup2()s the socketpair onto fd 0 and force-closes every FD (regardless of the likelyhood of it being open). 3. Then the forked process executes a specifically compiled lua interpreter wrapper program which prevents LUA_PATH et al being passed to the real lua interpreter. It also sets the command line for the real interpreter to simply be: lua -lsupple -esupple.sandbox.run() 4. The real interpreter then loads the Supple modules and starts the sandbox process. 5. Said interpreter, if setuid(root) then 1. makes a directory owned by root 2. changes into that directory 4. removes that directory 5. chroot()s into that (now) ephemeral directory 6. drops privileges 6. Finally the interpreter, now referred to as the Sandbox enters a receive state where it waits for a procedure call.