diff options
author | David Reiss <dreiss@apache.org> | 2008-02-06 22:18:40 +0000 |
---|---|---|
committer | David Reiss <dreiss@apache.org> | 2008-02-06 22:18:40 +0000 |
commit | 0c90f6f8af1d64ec9272bb2f6092336f3d0b8df8 (patch) | |
tree | 15245f459a76acc769d8fe99179176b4965bf66d /doc | |
parent | 3160971286aea0d5b28d5a7a87acaa8a12209ef8 (diff) | |
download | thrift-0c90f6f8af1d64ec9272bb2f6092336f3d0b8df8.tar.gz |
Thrift: Whitespace cleanup.
Summary:
- Expanded tabs to spaces where spaces were the norm.
- Deleted almost all trailing whitespace.
- Added newlines to the ends of a few files.
- Ran dos2unix on one file or two.
Reviewed By: mcslee
Test Plan: git diff -b
Revert Plan: ok
git-svn-id: https://svn.apache.org/repos/asf/incubator/thrift/trunk@665467 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'doc')
-rw-r--r-- | doc/thrift.tex | 90 |
1 files changed, 45 insertions, 45 deletions
diff --git a/doc/thrift.tex b/doc/thrift.tex index fc1e6ba1e..17766b598 100644 --- a/doc/thrift.tex +++ b/doc/thrift.tex @@ -20,9 +20,9 @@ \begin{document} -% \conferenceinfo{WXYZ '05}{date, City.} +% \conferenceinfo{WXYZ '05}{date, City.} % \copyrightyear{2007} -% \copyrightdata{[to be supplied]} +% \copyrightdata{[to be supplied]} % \titlebanner{banner above paper title} % These are ignored unless % \preprintfooter{short description of paper} % 'preprint' option specified. @@ -62,7 +62,7 @@ and why. \section{Introduction} As Facebook's traffic and network structure have scaled, the resource -demands of many operations on the site (i.e. search, +demands of many operations on the site (i.e. search, ad selection and delivery, event logging) have presented technical requirements drastically outside the scope of the LAMP framework. In our implementation of these services, various programming languages have been selected to @@ -102,7 +102,7 @@ or write their own serialization code. That is, a C++ programmer should be able to transparently exchange a strongly typed STL map for a dynamic Python dictionary. Neither programmer should be forced to write any code below the application layer -to achieve this. Section 2 details the Thrift type system. +to achieve this. Section 2 details the Thrift type system. \textit{Transport.} Each language must have a common interface to bidirectional raw data transport. The specifics of how a given @@ -195,7 +195,7 @@ an STL \texttt{vector}, Java \texttt{ArrayList}, or native array in scripting la contain duplicates. \item \texttt{set<type>} An unordered set of unique elements. Translates into an STL \texttt{set}, Java \texttt{HashSet}, \texttt{set} in Python, or native -dictionary in PHP/Ruby. +dictionary in PHP/Ruby. \item \texttt{map<type1,type2>} A map of strictly unique keys to values Translates into an STL \texttt{map}, Java \texttt{HashMap}, PHP associative array, or Python/Ruby dictionary. @@ -254,7 +254,7 @@ An example: service StringCache { void set(1:i32 key, 2:string value), string get(1:i32 key) throws (1:KeyNotFound knf), - void delete(1:i32 key) + void delete(1:i32 key) } \end{verbatim} @@ -283,7 +283,7 @@ The transport layer is used by the generated code to facilitate data transfer. A key design choice in the implementation of Thrift was to decouple the transport layer from the code generation layer. Though Thrift is typically used on top of the TCP/IP stack with streaming sockets as the base layer of -communication, there was no compelling reason to build that constraint into +communication, there was no compelling reason to build that constraint into the system. The performance tradeoff incurred by an abstracted I/O layer (roughly one virtual method lookup / function call per operation) was immaterial compared to the cost of actual I/O operations (typically invoking @@ -509,7 +509,7 @@ allows for version-safe modification of method parameters service StringCache { void set(1:i32 key, 2:string value), string get(1:i32 key) throws (1:KeyNotFound knf), - void delete(1:i32 key) + void delete(1:i32 key) } \end{verbatim} @@ -540,7 +540,7 @@ class Example { number(10), bigNumber(0), decimals(0), - name("thrifty") {} + name("thrifty") {} int32_t number; int64_t bigNumber; @@ -560,7 +560,7 @@ class Example { } __isset; ... } -\end{verbatim} +\end{verbatim} \subsection{Case Analysis} @@ -778,16 +778,16 @@ Thrift services require basic multithreading to handle simultaneous requests from multiple clients. For the Python and Java implementations of Thrift server logic, the standard threading libraries distributed with the languages provide adequate support. For the C++ implementation, no standard multithread runtime -library exists. Specifically, robust, lightweight, and portable +library exists. Specifically, robust, lightweight, and portable thread manager and timer class implementations do not exist. We investigated -existing implementations, namely \texttt{boost::thread}, +existing implementations, namely \texttt{boost::thread}, \texttt{boost::threadpool}, \texttt{ACE\_Thread\_Manager} and -\texttt{ACE\_Timer}. +\texttt{ACE\_Timer}. While \texttt{boost::threads}\cite{boost.threads} provides clean, lightweight and robust implementations of multi-thread primitives (mutexes, conditions, threads) it does not provide a thread manager or timer -implementation. +implementation. \texttt{boost::threadpool}\cite{boost.threadpool} also looked promising but was not far enough along for our purposes. We wanted to limit the dependency on @@ -801,7 +801,7 @@ added to the Boost distribution we may reconsider our decision to not use it. ACE has both a thread manager and timer class in addition to multi-thread primitives. The biggest problem with ACE is that it is ACE. Unlike Boost, ACE API quality is poor. Everything in ACE has large numbers of dependencies on -everything else in ACE - thus forcing developers to throw out standard +everything else in ACE - thus forcing developers to throw out standard classes, such as STL collections, in favor of ACE's homebrewed implementations. In addition, unlike Boost, ACE implementations demonstrate little understanding of the power and pitfalls of C++ programming and take no advantage of modern @@ -820,17 +820,17 @@ The Thrift thread libraries are implemented in the namespace\\ \end{itemize} As mentioned above, we were hesitant to introduce any additional dependencies -on Thrift. We decided to use \texttt{boost::shared\_ptr} because it is so +on Thrift. We decided to use \texttt{boost::shared\_ptr} because it is so useful for multithreaded application, it requires no link-time or runtime libraries (i.e. it is a pure template library) and it is due to become part of the C++0x standard. We implement standard \texttt{Mutex} and \texttt{Condition} classes, and a - \texttt{Monitor} class. The latter is simply a combination of a mutex and + \texttt{Monitor} class. The latter is simply a combination of a mutex and condition variable and is analogous to the \texttt{Monitor} implementation provided for -the Java \texttt{Object} class. This is also sometimes referred to as a barrier. We +the Java \texttt{Object} class. This is also sometimes referred to as a barrier. We provide a \texttt{Synchronized} guard class to allow Java-like synchronized blocks. -This is just a bit of syntactic sugar, but, like its Java counterpart, clearly +This is just a bit of syntactic sugar, but, like its Java counterpart, clearly delimits critical sections of code. Unlike its Java counterpart, we still have the ability to programmatically lock, unlock, block, and signal monitors. @@ -847,11 +847,11 @@ void run() { We again borrowed from Java the distinction between a thread and a runnable class. A \texttt{Thread} is the actual schedulable object. The -\texttt{Runnable} is the logic to execute within the thread. -The \texttt{Thread} implementation deals with all the platform-specific thread +\texttt{Runnable} is the logic to execute within the thread. +The \texttt{Thread} implementation deals with all the platform-specific thread creation and destruction issues, while the \texttt{Runnable} implementation deals with the application-specific per-thread logic. The benefit of this approach -is that developers can easily subclass the Runnable class without pulling in +is that developers can easily subclass the Runnable class without pulling in platform-specific super-classes. \subsection{Thread, Runnable, and shared\_ptr} @@ -875,7 +875,7 @@ itself in its \texttt{start} method. With the weak reference in hand the \texttt{ThreadMain} function can attempt to get a strong reference before entering the \texttt{Runnable::run} method of the \texttt{Runnable} object bound to the \texttt{Thread}. If no strong references to the -thread are obtained between exiting \texttt{Thread::start} and entering \texttt{ThreadMain}, the weak reference returns \texttt{null} and the function +thread are obtained between exiting \texttt{Thread::start} and entering \texttt{ThreadMain}, the weak reference returns \texttt{null} and the function exits immediately. The need for the \texttt{Thread} to make a weak reference to itself has a @@ -894,7 +894,7 @@ object may need to know about the thread in which it is executing, and a Thread, needs to know what \texttt{Runnable} object it is hosting. This interdependency is further complicated because the lifecycle of each object is independent of the other. An application may create a set of \texttt{Runnable} object to be reused in different threads, or it may create and forget a \texttt{Runnable} object -once a thread has been created and started for it. +once a thread has been created and started for it. The \texttt{Thread} class takes a \texttt{boost::shared\_ptr} reference to the hosted \texttt{Runnable} object in its constructor, while the \texttt{Runnable} class has an @@ -903,30 +903,30 @@ explicit \texttt{thread} method to allow explicit binding of the hosted thread. \subsection{ThreadManager} -\texttt{ThreadManager} creates a pool of worker threads and +\texttt{ThreadManager} creates a pool of worker threads and allows applications to schedule tasks for execution as free worker threads -become available. The \texttt{ThreadManager} does not implement dynamic +become available. The \texttt{ThreadManager} does not implement dynamic thread pool resizing, but provides primitives so that applications can add -and remove threads based on load. This approach was chosen because -implementing load metrics and thread pool size is very application +and remove threads based on load. This approach was chosen because +implementing load metrics and thread pool size is very application specific. For example some applications may want to adjust pool size based on running-average of work arrival rates that are measured via polled samples. Others may simply wish to react immediately to work-queue depth high and low water marks. Rather than trying to create a complex -API abstract enough to capture these different approaches, we -simply leave it up to the particular application and provide the +API abstract enough to capture these different approaches, we +simply leave it up to the particular application and provide the primitives to enact the desired policy and sample current status. \subsection{TimerManager} \texttt{TimerManager} allows applications to schedule - \texttt{Runnable} objects for execution at some point in the future. Its specific task + \texttt{Runnable} objects for execution at some point in the future. Its specific task is to allows applications to sample \texttt{ThreadManager} load at regular intervals and make changes to the thread pool size based on application policy. Of course, it can be used to generate any number of timer or alarm events. The default implementation of \texttt{TimerManager} uses a single thread to -execute expired \texttt{Runnable} objects. Thus, if a timer operation needs to +execute expired \texttt{Runnable} objects. Thus, if a timer operation needs to do a large amount of work and especially if it needs to do blocking I/O, that should be done in a separate thread. @@ -962,18 +962,18 @@ each contain an instance of the other. (Since we do not allow \texttt{null} struct instances in the generated C++ code, this would actually be impossible.) \subsection{TFileTransport} -The \texttt{TFileTransport} logs Thrift requests/structs by -framing incoming data with its length and writing it out to disk. -Using a framed on-disk format allows for better error checking and +The \texttt{TFileTransport} logs Thrift requests/structs by +framing incoming data with its length and writing it out to disk. +Using a framed on-disk format allows for better error checking and helps with the processing of a finite number of discrete events. The\\ -\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers -to ensure good performance while logging large amounts of data. +\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers +to ensure good performance while logging large amounts of data. A Thrift log file is split up into chunks of a specified size; logged messages -are not allowed to cross chunk boundaries. A message that would cross a chunk -boundary will cause padding to be added until the end of the chunk and the +are not allowed to cross chunk boundaries. A message that would cross a chunk +boundary will cause padding to be added until the end of the chunk and the first byte of the message are aligned to the beginning of the next chunk. -Partitioning the file into chunks makes it possible to read and interpret data -from a particular point in the file. +Partitioning the file into chunks makes it possible to read and interpret data +from a particular point in the file. \section{Facebook Thrift Services} Thrift has been employed in a large number of applications at Facebook, including @@ -984,15 +984,15 @@ Thrift is used as the underlying protocol and transport layer for the Facebook S The multi-language code generation is well suited for search because it allows for application development in an efficient server side language (C++) and allows the Facebook PHP-based web application to make calls to the search service using Thrift PHP libraries. There is also a large -variety of search stats, deployment and testing functionality that is built on top +variety of search stats, deployment and testing functionality that is built on top of generated Python code. Additionally, the Thrift log file format is -used as a redo log for providing real-time search index updates. Thrift has allowed the -search team to leverage each language for its strengths and to develop code at a rapid pace. +used as a redo log for providing real-time search index updates. Thrift has allowed the +search team to leverage each language for its strengths and to develop code at a rapid pace. \subsection{Logging} The Thrift \texttt{TFileTransport} functionality is used for structured logging. Each service function definition along with its parameters can be considered to be -a structured log entry identified by the function name. This log can then be used for +a structured log entry identified by the function name. This log can then be used for a variety of purposes, including inline and offline processing, stats aggregation and as a redo log. \section{Conclusions} |