diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/thrift.tex | 172 |
1 files changed, 165 insertions, 7 deletions
diff --git a/doc/thrift.tex b/doc/thrift.tex index c00695f10..39b03838b 100644 --- a/doc/thrift.tex +++ b/doc/thrift.tex @@ -16,6 +16,7 @@ \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} +\usepackage{url} \begin{document} @@ -769,9 +770,159 @@ reap the benefit of being able to easily debug corrupt or misunderstood data by looking for string contents. \subsection{Servers and Multithreading} -MARC TO WRITE THIS SECTION ON THE C++ concurrency PACKAGE AND -BASIC TThreadPoolServer PERFORMANCE ETC. (ie. 140K req/second, that kind of -thing) +Thrift services require basic multithreading services to handle simultaneous +requests from multiple clients. For the python and java implementations of +thrift server logic, the multi-thread support provided by those runtimes was more +than adequate. For the C++ implementation no standard multithread runtime +library support exists. Specifically a robust, lightweight, and portable +thread manager and timer class implementation do not exist. We investigated +existing implementations, namely {\tt boost::thread}, +{\tt boost::threadpool}, {\tt ACE\_Thread\_Manager} and {\tt ACE\_Timer}. + +While {\tt boost::threads \cite{boost.threads} } provides clean, lightweight and +robust implementations of multi-thread primitives (mutexes, conditions, threads) + it does not provide a thread manager or timer implementation. + +{\tt boost::threadpool \cite{boost.threadpool} } also looked promising but was not +far enough along for our purposes. We wanted to limit the dependency on +thirdparty libraries as much as possible. Because {\tt boost::threadpool} is not +a pure template library and requires runtime libraries and because it is not yet +part of the official boost distribution we felt it was not ready for use in thrift. +As {\tt boost::threadpool} evolves and especially if it is added to the boost +distribution we may reconsider our decision not to use it. + +ACE has both a thread manager and timer class in addition to multi-thread +primitives. The biggest problem with ACE is that it is ACE. Unlike boost, ACE +API quality is poor. Everything in ACE has large numbers of dependencies on +everything else in ACE - thus forcing developers to throw out standard classes, +like STL collection is favor of ACE's homebrewed implementations. In addition, +unlike boost, ACE implementations demonstrate little understanding of the power +and pitfalls of C++ programming and take no advantage of modern templating +techniques to ensure compile time safety and reasonable compiler error messages. +For all these reasons, ACE was rejected. + +\subsection{Thread Primitives} + +The thrift thread libraries have three components +\begin{itemize} +\item \texttt{primitives} +\item \texttt{thread pool manager} +\item \texttt{timer manager} +\end{itemize} + +As mentioned above, we were hesitant to introduce any additional dependencies on +thrift. We decided to use {\tt boost::shared\_ptr} because it is so useful for +multithreaded application, because it requires no link-time or runtime libraries +(ie it is a pure template library) and because it is become part of the C++0X +standard. + +We implement standard {\tt Mutex} and {\tt Condition} classes, and a + {\tt Monitor} class. The latter is simply a combination of a mutex and +condition variable and is analogous to the monitor implementation provided for +all objects in java. This is also sometimes referred to as a barrier. We +provide a {\tt Synchronized} guard class to allow java-like synchronized blocks. +This is just a bit of syntactic sugar, but, like its java counterpart, clearly +delimits critical sections of code. Unlike it's java counterpart, we still have +the ability to programmatically lock, unlock, block, and signal monitors. + +\begin{verbatim} + void run() { + {Synchronized s(manager->monitor); + if (manager->state == TimerManager::STARTING) { + manager->state = TimerManager::STARTED; + manager->monitor.notifyAll(); + } + } + } +\end{verbatim} + +We again borrowed from java the distinction between a thread and a runnable +class. A {\tt facebook::thread:Thread} is the actual schedulable object. The +{\tt facebook::thread::Runnable} is the logic to execute within the thread. +The {\tt Thread} implementation deals with all the platform-specific thread +creation and destruction issues, while the {tt Runnable} implementation deals +with the application-specific per-thread logic. . The benefit of this approach +is that developers can easily subclass the Runnable class without pulling in +platform-specific super-clases. + +\subsection{Thread, Runnable, and shared\_ptr} +We use {\tt boost::shared\_ptr} throughout the {\tt ThreadManager} and +{\tt TimerManager} implementations to guarantee cleanup of dead objects that can +be accessed by multiple threads. For {\tt Thread} class implementations, +{\tt boost::shared\_ptr} usage requires particular attention to make sure +{\tt Thread} objects are neither leaked nor dereferenced prematurely while +creating and shutting down threads. + +Thread creation requires calling into a C library. (In our case the POSIX +thread library, libhthread, but the same would be true for WIN32 threads). +Typically, the OS makes few if any guarantees about when a C thread's +entry-point function, {\tt ThreadMain} will be called. Therefore, it is +possible that our thread create call, +{\tt facebook::thread::ThreadFactory::newThread()} could return to the caller +well before that time. To ensure that the returned {\tt Thread} object is not +prematurely cleaned up if the caller gives up its reference prior to the +{\tt ThreadMain} call, the {\tt Thread} object makes a weak referenence to +itself in its {\tt start} method. + +With the weak reference in hand the {\tt ThreadMain} function can attempt to get +a strong reference before entering the {\tt Runnable::run} method of the +{\tt Runnable} object bound to the {\tt Thread}. If no strong refereneces to the +thread obtained between exiting {\tt Thread::start} and entering the C helper +function, {\tt ThreadMain}, the weak reference returns null and the function +exits immediately. + +The need for the {\tt Thread} to make a weak reference to itself has a +significant impact on the API. Since references are managed through the +{\tt boost::shared\_ptr} templates, the {\tt Thread} object must have a reference +to itself wrapped by the same {\tt boost::shared\_ptr} envelope that is returned +to the caller. This necessitated use of the factory pattern. +{\tt ThreadFactory} creates the raw {\tt Thread} object and +{tt boost::shared\_ptr} wrapper, and calls a private helper method of the class +implementing the {\tt Thread} interface (in this case, {\tt PosixThread::weakRef} + to allow it to make add weak reference to itself through the + {\tt boost::shared\_ptr} envelope. + +{\tt Thread} and {\tt Runnable} objects reference each other. A {\tt Runnable} +object may need to know which thread it is executing in and a Thread, obviously, +needs to know what {\tt Runnable} object it is hosting. This interdependency is +further complicated because the lifecycle of each object is independent of the +other. An application may create a set of {\tt Runnable} object to be used overs +and over in different threads, or it may create and forget a {\tt Runnable} object +once a thread has been created and started for it. + +The {\tt Thread} class takes a {\tt boost::shared\_ptr} reference to the hosted +{\tt Runnable} object in its contructor, while the {\tt Runnable} class has an +explicit {\tt thread} method to allow explicit binding of the hosted thread. +{\tt ThreadFactory::newThread} binds the two objects to each other. + +\subsection{ThreadManager} + +{\tt facebook::thread::ThreadManager} creates a pool of worker threads and +allows applications to schedule tasks for execution as free worker threads +become available. The {\tt ThreadManager} does not implement dynamic +thread pool resizing, but provides primitives so that applications can add +and remove threads based on load. This approach was chosen because +implementing load metrics and thread pool size is very application +specific. For example some applications may want to adjust pool size based +on running-average of work arrival rates that are measured via polled +samples. Others may simply wish to react immediately to work-queue +depth high and low water marks. Rather than trying to create a complex +API that is abstract enough to capture these different approaches, we +simply leave it up to the particular application and provide the +primitives to enact the desired policy and sample current status. + +\subsection{TimerManager} + +{\tt facebook::thread::TimerManager} applows applications to schedule + {\tt Runnable} object execution at some point in the future. Its specific task +is to allows applications to sample {\tt ThreadManager} load at regular +intervals and make changes to the thread pool size based on application policy. +Of course, it can be used to generate any number of timer or alarm events. + +The default implementation of {\tt TimerManager} uses a single thread to +execute expired {\tt Runnable} objects. Thus, if a timer operation needs to +do a large amount of work and especially if it needs to do blocking I/O, +that should be done in a separate thread. \subsection{Nonblocking Operation} Though the Thrift transport interfaces map more directly to a blocking I/O @@ -879,11 +1030,18 @@ Thrift is a successor to Pillar, a similar system developed by Adam D'Angelo, first while at Caltech and continued later at Facebook. Thrift simply would not have happened without Adam's insights. -%\begin{thebibliography}{} +\begin{thebibliography}{} + +\bibitem{boost.threads} +Kempf, William, +``Boost.Threads'', +\url{http://www.boost.org/doc/html/threads.html} -%\bibitem{smith02} -%Smith, P. Q. reference text +\bibitem{boost.threadpool} +Henkel, Philipp, +``threadpool'', +\url{http://threadpool.sourceforge.net} -%\end{thebibliography} +\end{thebibliography} \end{document} |