path: root/notes
diff options
authorDaniel Silverstone <>2012-07-21 16:32:07 +0100
committerDaniel Silverstone <>2012-07-21 16:32:07 +0100
commitb5460017f1e88baf283ebfaad341cd094f5041ff (patch)
tree445386ab7dbce5dbd3e97078225bb9be3e939960 /notes
parentb4d2d633cd88308487aa6e2b689a4328ce6023c1 (diff)
Lots of stuff
Diffstat (limited to 'notes')
1 files changed, 149 insertions, 0 deletions
diff --git a/notes/design b/notes/design
new file mode 100644
index 0000000..3cd1c90
--- /dev/null
+++ b/notes/design
@@ -0,0 +1,149 @@
+Sandbox (for) Untrusted Procedure Partitioning (in) Lua Engine - Supple
+# Requirements (mostly via Gitano)
+* Be able to run complex hook scripts in a totally restricted sandbox.
+* Be able to access appropriately filtered objects/functions/data from the
+ caller in some manner
+* Have the caller able to call back into the sandbox
+* Have the sandbox be in a subprocess which can be ulimit()ed etc.
+* Perhaps have some way to ensure the subprocess cannot access any of the
+ filesystem. (chrooted?)
+* Allow the caller to inject functions, values, etc into the remote sandbox
+ before execution. Perhaps even modules etc.
+* Be a minimal overhead system but still be 100% policy controllable at the
+ caller side of the sandbox
+* May as well use Luxio for the low level operations.
+* Provide the code to execute over the sandbox connection after the sandbox
+ has locked itself down.
+# Possible implementation details
+* Serialise requests and responses as simple packets which can be punted
+ across a single FD pair
+* Perhaps use userdata to ensure that there's no chance of rawset() etc
+ happening on either end for 'remote' objects.
+* Cannot use coroutines since pure lua 5.1 cannot yield across metamethod
+ boundaries :-( As such, pcall()s everydamnedwhere :-)
+* Types of packets passing across the link
+ * Procedure call
+ * Procedure reply
+## Context objects
+Contextual objects are anything non-integral (i.e. a table or userdata) which
+is passed across the connections. They get transformed into contextual objects
+which are always represented on the remote end as a userdata which has
+appropriate metamethods to cope with the object on the far end. Only
+metamethods which are defined by the sending end (and always __index and
+__newindex if it's a table) are defined on the receiving end in order to
+increase realism. The *sandboxed* end of the connection has pairs() and
+ipairs() augmented to cope with contextual objects, the non-sandboxed end has
+no such help.
+Contextual objects also represent remote functions (although they always
+tostring() to "function" rather than "table" or "userdata") and only support
+## Notification of contextual object
+Contextual objects are always transferred as tables with the following keys:
+* type: "table" "userdata" "function"
+* tag: A tag for the caller to use when referring to this object in future
+* methods: Optional table containing the set of metamethods defined on this
+ object.
+Note that the remote end will always augment the methods list with __gc so that
+the notification of GC can be made. Also if the type is "table" then __index
+and __newindex metamethods will be added for those normal table operations. If
+the type is "function" then __call will also be forcibly added.
+At either end, the set of contextual objects is held in tables.
+* my_objects -- objects sent from this side to the other which have not been
+ garbage collected on the remote end. Strong key (object) strong value
+ (tag)
+* their_objects -- objects from the remote side strong key (tag) weak value
+ (object)
+Also, the objects created as part of a function call are held strongly by the
+receiver of the procedure call so that they *cannot* go out of scope until
+the procedure call has sent its return value to the caller.
+## Procedure call
+Procedure calls are quite simple, they consist of a contextual object and a
+procedure name and a set of arguments.
+The procedure names are always of the form of metamethod names.
+The __gc metamethod tells the remote end that the local end is 'forgetting' the
+contextual object and the remote end can drop it from its cache.
+Thusly a procedure call (unserialised) looks approximately like this:
+ {
+ object = "some object tag",
+ method = "__something",
+ args = {
+ n = 4, -- The number of arguments. Trailing nils might be lost.
+ "string arg",
+ 1234, -- number argument
+ false, -- boolean argument
+ { -- Object argument of some kind
+ type = "table",
+ tag = "tag name",
+ methods = { "__call" }
+ }
+ },
+ }
+If the call is passing a contextual object from the remote end back again then
+it omits the type and methods values, only passing the tag. Thusly whenever a
+contextual object is passed as an argument, it is automatically unwrapped at
+the remote end into the local object. This means that we do not end up with
+multiple layers of contextual objects going back and forth.
+## Procedure returns
+Procedure returns are always of the following form:
+ {
+ error = true,
+ message = "some message",
+ traceback = "some traceback"
+ }
+ {
+ error = false,
+ results = {
+ n = N, -- The number of results detected
+ -- Any returns as a numerically indexed table.
+ -- Note that trailing nils may be collapsed.
+ }
+ }
+If the procedure return is an error form then the caller then stashes the
+traceback in the right place and raises a Lua error on the caller's side. This
+process might bounce back and forth until something catches the error in full.
+Otherwise, the results are returned exactly as arguments were provided,
+i.e. strings, booleans and numbers are passed directly, and anything else is an
+object. If a brand new object is returned as part of the call then the
+lifetime of that object will very much be limited to the caller remembering to
+anchor it somewhere. Otherwise it's quite possible that during the return from
+the procedure call, a __gc call will be automatically invoked to clean up.
+# Serialising requests and replies
+All requests and replies are structured as Lua tables once deserialised. The
+serialised form of the messages is simply a string which can be loaded and run
+to return the values. The limited number of formats mean that the serialisers
+can be written explicitly and the deserialiser can simply be a generic loader
+followed by a series of asserts and measures to ensure nothing malicious gets