diff options
author | Daniel Silverstone <dsilvers@digital-scurf.org> | 2012-07-21 16:32:07 +0100 |
---|---|---|
committer | Daniel Silverstone <dsilvers@digital-scurf.org> | 2012-07-21 16:32:07 +0100 |
commit | b5460017f1e88baf283ebfaad341cd094f5041ff (patch) | |
tree | 445386ab7dbce5dbd3e97078225bb9be3e939960 /notes | |
parent | b4d2d633cd88308487aa6e2b689a4328ce6023c1 (diff) | |
download | supple-b5460017f1e88baf283ebfaad341cd094f5041ff.tar.gz |
Lots of stuff
Diffstat (limited to 'notes')
-rw-r--r-- | notes/design | 149 |
1 files changed, 149 insertions, 0 deletions
diff --git a/notes/design b/notes/design new file mode 100644 index 0000000..3cd1c90 --- /dev/null +++ b/notes/design @@ -0,0 +1,149 @@ +Sandbox (for) Untrusted Procedure Partitioning (in) Lua Engine - Supple +======================================================================= + +# Requirements (mostly via Gitano) + +* Be able to run complex hook scripts in a totally restricted sandbox. +* Be able to access appropriately filtered objects/functions/data from the + caller in some manner +* Have the caller able to call back into the sandbox +* Have the sandbox be in a subprocess which can be ulimit()ed etc. +* Perhaps have some way to ensure the subprocess cannot access any of the + filesystem. (chrooted?) +* Allow the caller to inject functions, values, etc into the remote sandbox + before execution. Perhaps even modules etc. +* Be a minimal overhead system but still be 100% policy controllable at the + caller side of the sandbox +* May as well use Luxio for the low level operations. +* Provide the code to execute over the sandbox connection after the sandbox + has locked itself down. + +# Possible implementation details + +* Serialise requests and responses as simple packets which can be punted + across a single FD pair +* Perhaps use userdata to ensure that there's no chance of rawset() etc + happening on either end for 'remote' objects. +* Cannot use coroutines since pure lua 5.1 cannot yield across metamethod + boundaries :-( As such, pcall()s everydamnedwhere :-) +* Types of packets passing across the link + * Procedure call + * Procedure reply + +## Context objects + +Contextual objects are anything non-integral (i.e. a table or userdata) which +is passed across the connections. They get transformed into contextual objects +which are always represented on the remote end as a userdata which has +appropriate metamethods to cope with the object on the far end. Only +metamethods which are defined by the sending end (and always __index and +__newindex if it's a table) are defined on the receiving end in order to +increase realism. The *sandboxed* end of the connection has pairs() and +ipairs() augmented to cope with contextual objects, the non-sandboxed end has +no such help. + +Contextual objects also represent remote functions (although they always +tostring() to "function" rather than "table" or "userdata") and only support +__call. + +## Notification of contextual object + +Contextual objects are always transferred as tables with the following keys: + +* type: "table" "userdata" "function" +* tag: A tag for the caller to use when referring to this object in future +* methods: Optional table containing the set of metamethods defined on this + object. + +Note that the remote end will always augment the methods list with __gc so that +the notification of GC can be made. Also if the type is "table" then __index +and __newindex metamethods will be added for those normal table operations. If +the type is "function" then __call will also be forcibly added. + +At either end, the set of contextual objects is held in tables. + +* my_objects -- objects sent from this side to the other which have not been + garbage collected on the remote end. Strong key (object) strong value + (tag) +* their_objects -- objects from the remote side strong key (tag) weak value + (object) + +Also, the objects created as part of a function call are held strongly by the +receiver of the procedure call so that they *cannot* go out of scope until +the procedure call has sent its return value to the caller. + +## Procedure call + +Procedure calls are quite simple, they consist of a contextual object and a +procedure name and a set of arguments. + +The procedure names are always of the form of metamethod names. + +The __gc metamethod tells the remote end that the local end is 'forgetting' the +contextual object and the remote end can drop it from its cache. + +Thusly a procedure call (unserialised) looks approximately like this: + + { + object = "some object tag", + method = "__something", + args = { + n = 4, -- The number of arguments. Trailing nils might be lost. + "string arg", + 1234, -- number argument + false, -- boolean argument + { -- Object argument of some kind + type = "table", + tag = "tag name", + methods = { "__call" } + } + }, + } + +If the call is passing a contextual object from the remote end back again then +it omits the type and methods values, only passing the tag. Thusly whenever a +contextual object is passed as an argument, it is automatically unwrapped at +the remote end into the local object. This means that we do not end up with +multiple layers of contextual objects going back and forth. + +## Procedure returns + +Procedure returns are always of the following form: + + { + error = true, + message = "some message", + traceback = "some traceback" + } + +or + + { + error = false, + results = { + n = N, -- The number of results detected + -- Any returns as a numerically indexed table. + -- Note that trailing nils may be collapsed. + } + } + +If the procedure return is an error form then the caller then stashes the +traceback in the right place and raises a Lua error on the caller's side. This +process might bounce back and forth until something catches the error in full. + +Otherwise, the results are returned exactly as arguments were provided, +i.e. strings, booleans and numbers are passed directly, and anything else is an +object. If a brand new object is returned as part of the call then the +lifetime of that object will very much be limited to the caller remembering to +anchor it somewhere. Otherwise it's quite possible that during the return from +the procedure call, a __gc call will be automatically invoked to clean up. + +# Serialising requests and replies + +All requests and replies are structured as Lua tables once deserialised. The +serialised form of the messages is simply a string which can be loaded and run +to return the values. The limited number of formats mean that the serialisers +can be written explicitly and the deserialiser can simply be a generic loader +followed by a series of asserts and measures to ensure nothing malicious gets +injected. + |