Notes from PyCon 2013 sprints ============================= - Cancellation. If a task creates several subtasks, and then the parent task fails, should the subtasks be cancelled? (How do we even establish the parent/subtask relationship?) - Adam Sah suggests that there might be a need for scheduling (especially when multiple frameworks share an event loop). He points to lottery scheduling but also mentions that's just one of the options. However, after posting on python-tulip, it appears none of the other frameworks have scheduling, and nobody seems to miss it. - Feedback from Bram Cohen (Bittorrent creator) about UDP. He doesn't think connected UDP is worth supporting, it doesn't do anything except tell the kernel about the default target address for sendto(). Basically he says all UDP end points are servers. He sent me his own UDP event loop so I might glean some tricks from it. He says we should treat EINTR the same as EAGAIN and friends. (We should use the exceptions dedicated to errno checking, BTW.) HE said to make sure we use SO_REUSEADDR (I think we already do). He said to set the max datagram sizes pretty large (anything larger than the declared limit is dropped on the floor). He reminds us of the importance of being able to pick a valid, unused port by binding to port 0 and then using getsockname(). He has an idea where he's like to be able to kill all registered callbacks (i.e. Handles) belonging to a certain "context". I think this can be done at the application level (you'd have to wrap everything that returns a Handle and collect these handles in some set or other datastructure) but if someone thinks it's interesting we could imagine having some kind of notion of context part of the event loop state, e.g. associated with a Task (see Cancellation point above). He brought up uTP (Micro Transport Protocol), a reimplementation of TCP over UDP with more refined congestion control. - Mumblings about UNIX domain sockets and IPv6 addresses being 4-tuples. The former can be handled by passing in a socket. There seem to be no real use cases for the latter that can't be dealt with by passing in suitably esoteric strings for the hostname. getaddrinfo() will produce the appropriate 4-tuple and connect() will accept it. - Mumblings on the list about add vs. set. Notes from the second Tulip/Twisted meet-up =========================================== Rackspace, 12/11/2012 Glyph, Brian Warner, David Reid, Duncan McGreggor, others Flow control ------------ - Pause/resume on transport manages data_received. - There's also an API to tell the transport whom to pause when the write calls are overwhelming it: IConsumer.registerProducer(). - There's also something called pipes but it's built on top of the old interface. - Twisted has variations on the basic flow control that I should ignore. Half_close ---------- - This sends an EOF after writing some stuff. - Can't write any more. - Problem with TLS is known (the RFC sadly specifies this behavior). - It must be dynamimcally discoverable whether the transport supports half_close, since the protocol may have to do something different to make up for its missing (e.g. use chunked encoding). Twisted uses an interface check for this and also hasattr(trans, 'halfClose') but a flag (or flag method) is fine too. Constructing transport and protocol ----------------------------------- - There are good reasons for passing a function to the transport construction helper that creates the protocol. (You need these anyway for server-side protocols.) The sequence of events is something like . open socket . create transport (pass it a socket?) . create protocol (pass it nothing) . proto.make_connection(transport); this does: . self.transport = transport . self.connection_made(transport) But it seems okay to skip make_connection and setting .transport. Note that make_connection() is a concrete method on the Protocol implementation base class, while connection_made() is an abstract method on IProtocol. Event Loop ---------- - We discussed the sequence of actions in the event loop. I think in the end we're fine with what Tulip currently does. There are two choices: Tulip: . run ready callbacks until there aren't any left . poll, adding more callbacks to the ready list . add now-ready delayed callbacks to the ready list . go to top Tornado: . run all currently ready callbacks (but not new ones added during this) . (the rest is the same) The difference is that in the Tulip version, CPU bound callbacks that keep adding more to the queue will starve I/O (and yielding to other tasks won't actually cause I/O to happen unless you do e.g. sleep(0.001)). OTOH this may be good because it means there's less overhead if you frequently split operations in two. - I think Twisted does it Tornado style (in a convoluted way :-), but it may not matter, and it's important to leave this vague so implementations can do what's best for their platform. (E.g. if the event loop is built into the OS there are different trade-offs.) System call cost ---------------- - System calls on MacOS are expensive, on Linux they are cheap. - Optimal buffer size ~16K. - Try joining small buffer pieces together, but expect to be tuning this later. Futures ------- - Futures are the most robust API for async stuff, you can check errors etc. So let's do this. - Just don't implement wait(). - For the basics, however, (recv/send, mostly), don't use Futures but use basic callbacks, transport/protocol style. - make_connection() (by any name) can return a Future, it makes it easier to check for errors. - This means revisiting the Tulip proactor branch (IOCP). - The semantics of add_done_callback() are fuzzy about in which thread the callback will be called. (It may be the current thread or another one.) We don't like that. But always inserting a call_soon() indirection may be expensive? Glyph suggested changing the add_done_callback() method name to something else to indicate the changed promise. - Separately, I've been thinking about having two versions of call_soon() -- a more heavy-weight one to be called from other threads that also writes a byte to the self-pipe. Signals ------- - There was a side conversation about signals. A signal handler is similar to another thread, so probably should use (the heavy-weight version of) call_soon() to schedule the real callback and not do anything else. - Glyph vaguely recalled some trickiness with the self-pipe. We should be able to fix this afterwards if necessary, it shouldn't affect the API design.