Add more documents to the developers guide

author: Jelmer Vernooij <jelmer@samba.org> 2002-08-29 13:28:17 +0000
committer: Jelmer Vernooij <jelmer@samba.org> 2002-08-29 13:28:17 +0000
commit: e05bdd9eab760b5dc6a4442dc89752080ff1d2c1 (patch)
tree: 9190f5c1824adecca9813fc1521132d0f519c649 /docs
parent: 4631e1fd1039c7eed58d3738bd7310219acc9000 (diff)
download: samba-e05bdd9eab760b5dc6a4442dc89752080ff1d2c1.tar.gz
7 files changed, 1743 insertions, 0 deletions
diff --git a/docs/docbook/devdoc/CodingSuggestions.sgml b/docs/docbook/devdoc/CodingSuggestions.sgml
new file mode 100644
index 00000000000..bdf6d3d17d3
--- /dev/null
+++ b/docs/docbook/devdoc/CodingSuggestions.sgml
@@ -0,0 +1,237 @@
+<chapter id="CodingSuggestions">
+<chapterinfo>
+	<author>
+		<firstname>Steve</firstname><surname>French</surname>
+	</author>
+	<author>
+		<firstname>Simo</firstname><surname>Sorce</surname>
+	</author>
+	<author>
+		<firstname>Andrew</firstname><surname>Bartlett</surname>
+	</author>
+	<author>
+		<firstname>Tim</firstname><surname>Potter</surname>
+	</author>
+	<author>
+		<firstname>Martin</firstname><surname>Pool</surname>
+	</author>
+</chapterinfo>
+
+<title>Coding Suggestions</title>
+
+<para>
+So you want to add code to Samba ...
+</para>
+
+<para>
+One of the daunting tasks facing a programmer attempting to write code for
+Samba is understanding the various coding conventions used by those most
+active in the project.  These conventions were mostly unwritten and helped
+improve either the portability, stability or consistency of the code. This
+document will attempt to document a few of the more important coding
+practices used at this time on the Samba project.  The coding practices are
+expected to change slightly over time, and even to grow as more is learned
+about obscure portability considerations.  Two existing documents
+<filename>samba/source/internals.doc</filename> and 
+<filename>samba/source/architecture.doc</filename> provide
+additional information.
+</para>
+
+<para>
+The loosely related question of coding style is very personal and this
+document does not attempt to address that subject, except to say that I
+have observed that eight character tabs seem to be preferred in Samba
+source.  If you are interested in the topic of coding style, two oft-quoted
+documents are:
+</para>
+
+<para>
+<ulink url="http://lxr.linux.no/source/Documentation/CodingStyle">http://lxr.linux.no/source/Documentation/CodingStyle</ulink>
+</para>
+
+<para>
+<ulink url="http://www.fsf.org/prep/standards_toc.html">http://www.fsf.org/prep/standards_toc.html</ulink>
+</para>
+
+<para>
+But note that coding style in Samba varies due to the many different
+programmers who have contributed.
+</para>
+
+<para>
+Following are some considerations you should use when adding new code to
+Samba.  First and foremost remember that:
+</para>
+
+<para>
+Portability is a primary consideration in adding function, as is network
+compatability with de facto, existing, real world CIFS/SMB implementations.
+There are lots of platforms that Samba builds on so use caution when adding
+a call to a library function that is not invoked in existing Samba code.
+Also note that there are many quite different SMB/CIFS clients that Samba
+tries to support, not all of which follow the SNIA CIFS Technical Reference
+(or the earlier Microsoft reference documents or the X/Open book on the SMB
+Standard) perfectly.
+</para>
+
+<para>
+Here are some other suggestions:
+</para>
+
+<orderedlist>
+
+<listitem><para>
+	use d_printf instead of printf for display text
+	reason: enable auto-substitution of translated language text 
+</para></listitem>
+
+<listitem><para>
+	use SAFE_FREE instead of free
+	reason: reduce traps due to null pointers
+</para></listitem>
+
+<listitem><para>
+	don't use bzero use memset, or ZERO_STRUCT and ZERO_STRUCTP macros
+	reason: not POSIX
+</para></listitem>
+
+<listitem><para>
+	don't use strcpy and strlen (use safe_* equivalents)
+	reason: to avoid traps due to buffer overruns
+</para></listitem>
+
+<listitem><para>
+	don't use getopt_long, use popt functions instead
+	reason: portability
+</para></listitem>
+
+<listitem><para>
+	explicitly add const qualifiers on parm passing in functions where parm
+	is input only (somewhat controversial but const can be #defined away)
+</para></listitem>
+
+<listitem><para>
+	when passing a va_list as an arg, or assigning one to another
+	please use the VA_COPY() macro
+	reason: on some platforms, va_list is a struct that must be 
+	initialized in each function...can SEGV if you don't.
+</para></listitem>
+
+<listitem><para>
+	discourage use of threads
+	reason: portability (also see architecture.doc)
+</para></listitem>
+
+<listitem><para>
+	don't explicitly include new header files in C files - new h files 
+	should be included by adding them once to includes.h
+	reason: consistency
+</para></listitem>
+
+<listitem><para>
+	don't explicitly extern functions (they are autogenerated by 
+	"make proto" into proto.h)
+	reason: consistency
+</para></listitem>
+
+<listitem><para>
+	use endian safe macros when unpacking SMBs (see byteorder.h and
+	internals.doc)
+	reason: not everyone uses Intel
+</para></listitem>
+
+<listitem><para>
+	Note Unicode implications of charset handling (see internals.doc).  See
+	pull_*  and push_* and convert_string functions.
+	reason: Internationalization
+</para></listitem>
+
+<listitem><para>
+	Don't assume English only
+	reason: See above
+</para></listitem>
+
+<listitem><para>
+	Try to avoid using in/out parameters (functions that return data which
+	overwrites input parameters)
+	reason: Can cause stability problems
+</para></listitem>
+
+<listitem><para>
+	Ensure copyright notices are correct, don't append Tridge's name to code
+	that he didn't write.  If you did not write the code, make sure that it
+	can coexist with the rest of the Samba GPLed code.
+</para></listitem>
+
+<listitem><para>
+	Consider usage of DATA_BLOBs for length specified byte-data.
+	reason: stability
+</para></listitem>
+
+<listitem><para>
+	Take advantage of tdbs for database like function
+	reason: consistency
+</para></listitem>
+
+<listitem><para>
+	Don't access the SAM_ACCOUNT structure directly, they should be accessed
+	via pdb_get...() and pdb_set...() functions.
+	reason: stability, consistency
+</para></listitem>
+
+<listitem><para>
+	Don't check a password directly against the passdb, always use the
+	check_password() interface.
+	reason: long term pluggability
+</para></listitem>
+
+<listitem><para>
+	Try to use asprintf rather than pstrings and fstrings where possible
+</para></listitem>
+
+<listitem><para>
+	Use normal C comments / * instead of C++ comments // like
+	this.  Although the C++ comment format is part of the C99
+	standard, some older vendor C compilers do not accept it.
+</para></listitem>
+
+<listitem><para>
+	Try to write documentation for API functions and structures
+	explaining the point of the code, the way it should be used, and
+	any special conditions or results.  Mark these with a double-star
+	comment start / ** so that they can be picked up by Doxygen, as in
+	this file.
+</para></listitem>
+
+<listitem><para>
+	Keep the scope narrow. This means making functions/variables
+	static whenever possible. We don't want our namespace
+	polluted. Each module should have a minimal number of externally
+	visible functions or variables.
+</para></listitem>
+
+<listitem><para>
+	Use function pointers to keep knowledge about particular pieces of
+	code isolated in one place. We don't want a particular piece of
+	functionality to be spread out across lots of places - that makes
+	for fragile, hand to maintain code. Instead, design an interface
+	and use tables containing function pointers to implement specific
+	functionality. This is particularly important for command
+	interpreters. 
+</para></listitem>
+
+<listitem><para>
+	Think carefully about what it will be like for someone else to add
+	to and maintain your code. If it would be hard for someone else to
+	maintain then do it another way. 
+</para></listitem>
+
+</orderedlist>
+
+<para>
+The suggestions above are simply that, suggestions, but the information may
+help in reducing the routine rework done on new code.  The preceeding list
+is expected to change routinely as new support routines and macros are
+added.
+</para>
+</chapter>
diff --git a/docs/docbook/devdoc/architecture.sgml b/docs/docbook/devdoc/architecture.sgml
new file mode 100644
index 00000000000..312a63af97e
--- /dev/null
+++ b/docs/docbook/devdoc/architecture.sgml
@@ -0,0 +1,184 @@
+<chapter id="architecture">
+<chapterinfo>
+	<author>
+		<firstname>Dan</firstname><surname>Shearer</surname>
+	</author>
+	<pubdate> November 1997</pubdate>
+</chapterinfo>
+
+<title>Samba Architecture</title>
+
+<sect1>
+<title>Introduction</title>
+
+<para>
+This document gives a general overview of how Samba works
+internally. The Samba Team has tried to come up with a model which is
+the best possible compromise between elegance, portability, security
+and the constraints imposed by the very messy SMB and CIFS
+protocol. 
+</para>
+
+<para>
+It also tries to answer some of the frequently asked questions such as:
+</para>
+
+<orderedlist>
+<listitem><para>
+	Is Samba secure when running on Unix? The xyz platform?
+	What about the root priveliges issue?
+</para></listitem>
+
+<listitem><para>Pros and cons of multithreading in various parts of Samba</para></listitem>
+
+<listitem><para>Why not have a separate process for name resolution, WINS, and browsing?</para></listitem>
+
+</orderedlist>
+
+</sect1>
+
+<sect1>
+<title>Multithreading and Samba</title>
+
+<para>
+People sometimes tout threads as a uniformly good thing. They are very
+nice in their place but are quite inappropriate for smbd. nmbd is
+another matter, and multi-threading it would be very nice. 
+</para>
+
+<para>
+The short version is that smbd is not multithreaded, and alternative
+servers that take this approach under Unix (such as Syntax, at the
+time of writing) suffer tremendous performance penalties and are less
+robust. nmbd is not threaded either, but this is because it is not
+possible to do it while keeping code consistent and portable across 35
+or more platforms. (This drawback also applies to threading smbd.)
+</para>
+
+<para>
+The longer versions is that there are very good reasons for not making
+smbd multi-threaded.  Multi-threading would actually make Samba much
+slower, less scalable, less portable and much less robust. The fact
+that we use a separate process for each connection is one of Samba's
+biggest advantages.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Threading smbd</title>
+
+<para>
+A few problems that would arise from a threaded smbd are:
+</para>
+
+<orderedlist>
+<listitem><para>
+	It's not only to create threads instead of processes, but you
+	must care about all variables if they have to be thread specific
+	(currently they would be global).
+</para></listitem>
+
+<listitem><para>
+	if one thread dies (eg. a seg fault) then all threads die. We can
+	immediately throw robustness out the window.
+</para></listitem>
+
+<listitem><para>
+	many of the system calls we make are blocking. Non-blocking
+	equivalents of many calls are either not available or are awkward (and
+	slow) to use. So while we block in one thread all clients are
+	waiting. Imagine if one share is a slow NFS filesystem and the others 
+	are fast, we will end up slowing all clients to the speed of NFS.
+</para></listitem>
+
+<listitem><para>
+	you can't run as a different uid in different threads. This means
+	we would have to switch uid/gid on _every_ SMB packet. It would be
+	horrendously slow.
+</para></listitem>
+
+<listitem><para>
+	the per process file descriptor limit would mean that we could only
+	support a limited number of clients.
+</para></listitem>
+
+<listitem><para>
+	we couldn't use the system locking calls as the locking context of
+	fcntl() is a process, not a thread.
+</para></listitem>
+
+</orderedlist>
+
+</sect1>
+
+<sect1>
+<title>Threading nmbd</title>
+
+<para>
+This would be ideal, but gets sunk by portability requirements.
+</para>
+
+<para>
+Andrew tried to write a test threads library for nmbd that used only
+ansi-C constructs (using setjmp and longjmp). Unfortunately some OSes
+defeat this by restricting longjmp to calling addresses that are
+shallower than the current address on the stack (apparently AIX does
+this). This makes a truly portable threads library impossible. So to
+support all our current platforms we would have to code nmbd both with
+and without threads, and as the real aim of threads is to make the
+code clearer we would not have gained anything. (it is a myth that
+threads make things faster. threading is like recursion, it can make
+things clear but the same thing can always be done faster by some
+other method)
+</para>
+
+<para>
+Chris tried to spec out a general design that would abstract threading
+vs separate processes (vs other methods?) and make them accessible
+through some general API. This doesn't work because of the data
+sharing requirements of the protocol (packets in the future depending
+on packets now, etc.) At least, the code would work but would be very
+clumsy, and besides the fork() type model would never work on Unix. (Is there an OS that it would work on, for nmbd?)
+</para>
+
+<para>
+A fork() is cheap, but not nearly cheap enough to do on every UDP
+packet that arrives. Having a pool of processes is possible but is
+nasty to program cleanly due to the enormous amount of shared data (in
+complex structures) between the processes. We can't rely on each
+platform having a shared memory system.
+</para>
+
+</sect1>
+
+<sect1>
+<title>nbmd Design</title>
+
+<para>
+Originally Andrew used recursion to simulate a multi-threaded
+environment, which use the stack enormously and made for really
+confusing debugging sessions. Luke Leighton rewrote it to use a
+queuing system that keeps state information on each packet.  The
+first version used a single structure which was used by all the
+pending states.  As the initialisation of this structure was
+done by adding arguments, as the functionality developed, it got
+pretty messy.  So, it was replaced with a higher-order function
+and a pointer to a user-defined memory block.  This suddenly
+made things much simpler: large numbers of functions could be
+made static, and modularised.  This is the same principle as used
+in NT's kernel, and achieves the same effect as threads, but in
+a single process.
+</para>
+
+<para>
+Then Jeremy rewrote nmbd. The packet data in nmbd isn't what's on the
+wire. It's a nice format that is very amenable to processing but still
+keeps the idea of a distinct packet. See "struct packet_struct" in
+nameserv.h.  It has all the detail but none of the on-the-wire
+mess. This makes it ideal for using in disk or memory-based databases
+for browsing and WINS support. 
+</para>
+
+</sect1>
+</chapter>
diff --git a/docs/docbook/devdoc/debug.sgml b/docs/docbook/devdoc/debug.sgml
new file mode 100644
index 00000000000..7e81cc825db
--- /dev/null
+++ b/docs/docbook/devdoc/debug.sgml
@@ -0,0 +1,321 @@
+<chapter id="debug">
+<chapterinfo>
+	<author>
+		<firstname>Chris</firstname><surname>Hertel</surname>
+	</author>
+	<pubdate>July 1998</pubdate>
+</chapterinfo>
+
+<title>The samba DEBUG system</title>
+
+<sect1>
+<title>New Output Syntax</title>
+
+<para>
+   The syntax of a debugging log file is represented as:
+</para>
+
+<para><programlisting>
+  &gt;debugfile&lt; :== { &gt;debugmsg&lt; }
+
+  &gt;debugmsg&lt;  :== &gt;debughdr&lt; '\n' &gt;debugtext&lt;
+
+  &gt;debughdr&lt;  :== '[' TIME ',' LEVEL ']' FILE ':' [FUNCTION] '(' LINE ')'
+
+  &gt;debugtext&lt; :== { &gt;debugline&lt; }
+
+  &gt;debugline&lt; :== TEXT '\n'
+</programlisting></para>
+
+<para>
+TEXT is a string of characters excluding the newline character.
+</para>
+
+<para>
+LEVEL is the DEBUG level of the message (an integer in the range
+		0..10).
+</para>
+
+<para>
+TIME is a timestamp.
+</para>
+
+<para>
+FILE is the name of the file from which the debug message was
+generated.
+</para>
+
+<para>
+FUNCTION is the function from which the debug message was generated.
+</para>
+
+<para>
+LINE is the line number of the debug statement that generated the
+message.
+</para>
+
+<para>Basically, what that all means is:</para>
+<orderedlist>
+<listitem><para>
+A debugging log file is made up of debug messages.
+</para></listitem>
+<listitem><para>
+Each debug message is made up of a header and text. The header is
+separated from the text by a newline.
+</para></listitem>
+<listitem><para>
+The header begins with the timestamp and debug level of the
+message enclosed in brackets. The filename, function, and line
+number at which the message was generated follow. The filename is
+terminated by a colon, and the function name is terminated by the
+parenthesis which contain the line number. Depending upon the
+compiler, the function name may be missing (it is generated by the
+__FUNCTION__ macro, which is not universally implemented, dangit).
+</para></listitem>
+<listitem><para>
+The message text is made up of zero or more lines, each terminated
+by a newline.
+</para></listitem>
+</orderedlist>
+
+<para>Here's some example output:</para>
+
+<para><programlisting>
+    [1998/08/03 12:55:25, 1] nmbd.c:(659)
+      Netbios nameserver version 1.9.19-prealpha started.
+      Copyright Andrew Tridgell 1994-1997
+    [1998/08/03 12:55:25, 3] loadparm.c:(763)
+      Initializing global parameters
+</programlisting></para>
+
+<para>
+Note that in the above example the function names are not listed on
+the header line. That's because the example above was generated on an
+SGI Indy, and the SGI compiler doesn't support the __FUNCTION__ macro.
+</para>
+
+</sect1>
+
+<sect1>
+<title>The DEBUG() Macro</title>
+
+<para>
+Use of the DEBUG() macro is unchanged. DEBUG() takes two parameters.
+The first is the message level, the second is the body of a function
+call to the Debug1() function.
+</para>
+
+<para>That's confusing.</para>
+
+<para>Here's an example which may help a bit. If you would write</para>
+
+<para><programlisting>
+printf( "This is a %s message.\n", "debug" );
+</programlisting></para>
+
+<para>
+to send the output to stdout, then you would write
+</para>
+
+<para><programlisting>
+DEBUG( 0, ( "This is a %s message.\n", "debug" ) );
+</programlisting></para>
+
+<para>
+to send the output to the debug file.  All of the normal printf()
+formatting escapes work.
+</para>
+
+<para>
+Note that in the above example the DEBUG message level is set to 0.
+Messages at level 0 always print.  Basically, if the message level is
+less than or equal to the global value DEBUGLEVEL, then the DEBUG
+statement is processed.
+</para>
+
+<para>
+The output of the above example would be something like:
+</para>
+
+<para><programlisting>
+    [1998/07/30 16:00:51, 0] file.c:function(128)
+      This is a debug message.
+</programlisting></para>
+
+<para>
+Each call to DEBUG() creates a new header *unless* the output produced
+by the previous call to DEBUG() did not end with a '\n'. Output to the
+debug file is passed through a formatting buffer which is flushed
+every time a newline is encountered. If the buffer is not empty when
+DEBUG() is called, the new input is simply appended.
+</para>
+
+<para>
+...but that's really just a Kludge. It was put in place because
+DEBUG() has been used to write partial lines. Here's a simple (dumb)
+example of the kind of thing I'm talking about:
+</para>
+
+<para><programlisting>
+    DEBUG( 0, ("The test returned " ) );
+    if( test() )
+      DEBUG(0, ("True") );
+    else
+      DEBUG(0, ("False") );
+    DEBUG(0, (".\n") );
+</programlisting></para>
+
+<para>
+Without the format buffer, the output (assuming test() returned true)
+would look like this:
+</para>
+
+<para><programlisting>
+    [1998/07/30 16:00:51, 0] file.c:function(256)
+      The test returned
+    [1998/07/30 16:00:51, 0] file.c:function(258)
+      True
+    [1998/07/30 16:00:51, 0] file.c:function(261)
+      .
+</programlisting></para>
+
+<para>Which isn't much use. The format buffer kludge fixes this problem.
+</para>
+
+</sect1>
+
+<sect1>
+<title>The DEBUGADD() Macro</title>
+
+<para>
+In addition to the kludgey solution to the broken line problem
+described above, there is a clean solution. The DEBUGADD() macro never
+generates a header. It will append new text to the current debug
+message even if the format buffer is empty. The syntax of the
+DEBUGADD() macro is the same as that of the DEBUG() macro.
+</para>
+
+<para><programlisting>
+    DEBUG( 0, ("This is the first line.\n" ) );
+    DEBUGADD( 0, ("This is the second line.\nThis is the third line.\n" ) );
+</programlisting></para>
+
+<para>Produces</para>
+
+<para><programlisting>
+    [1998/07/30 16:00:51, 0] file.c:function(512)
+      This is the first line.
+      This is the second line.
+      This is the third line.
+</programlisting></para>
+
+</sect1>
+
+<sect1>
+<title>The DEBUGLVL() Macro</title>
+
+<para>
+One of the problems with the DEBUG() macro was that DEBUG() lines
+tended to get a bit long. Consider this example from
+nmbd_sendannounce.c:
+</para>
+
+<para><programlisting>
+  DEBUG(3,("send_local_master_announcement: type %x for name %s on subnet %s for workgroup %s\n",
+            type, global_myname, subrec->subnet_name, work->work_group));
+</programlisting></para>
+
+<para>
+One solution to this is to break it down using DEBUG() and DEBUGADD(),
+as follows:
+</para>
+
+<para><programlisting>
+  DEBUG( 3, ( "send_local_master_announcement: " ) );
+  DEBUGADD( 3, ( "type %x for name %s ", type, global_myname ) );
+  DEBUGADD( 3, ( "on subnet %s ", subrec->subnet_name ) );
+  DEBUGADD( 3, ( "for workgroup %s\n", work->work_group ) );
+</programlisting></para>
+
+<para>
+A similar, but arguably nicer approach is to use the DEBUGLVL() macro.
+This macro returns True if the message level is less than or equal to
+the global DEBUGLEVEL value, so:
+</para>
+
+<para><programlisting>
+  if( DEBUGLVL( 3 ) )
+    {
+    dbgtext( "send_local_master_announcement: " );
+    dbgtext( "type %x for name %s ", type, global_myname );
+    dbgtext( "on subnet %s ", subrec->subnet_name );
+    dbgtext( "for workgroup %s\n", work->work_group );
+    }
+</programlisting></para>
+
+<para>(The dbgtext() function is explained below.)</para>
+
+<para>There are a few advantages to this scheme:</para>
+<orderedlist>
+<listitem><para>
+The test is performed only once.
+</para></listitem>
+<listitem><para>
+You can allocate variables off of the stack that will only be used
+within the DEBUGLVL() block.
+</para></listitem>
+<listitem><para>
+Processing that is only relevant to debug output can be contained
+within the DEBUGLVL() block.
+</para></listitem>
+</orderedlist>
+
+</sect1>
+
+<sect1>
+<title>New Functions</title>
+
+<sect2>
+<title>dbgtext()</title>
+<para>
+This function prints debug message text to the debug file (and
+possibly to syslog) via the format buffer. The function uses a
+variable argument list just like printf() or Debug1(). The
+input is printed into a buffer using the vslprintf() function,
+and then passed to format_debug_text().
+
+If you use DEBUGLVL() you will probably print the body of the
+message using dbgtext(). 
+</para>
+</sect2>
+
+<sect2>
+<title>dbghdr()</title>
+<para>
+This is the function that writes a debug message header.
+Headers are not processed via the format buffer. Also note that
+if the format buffer is not empty, a call to dbghdr() will not
+produce any output. See the comments in dbghdr() for more info.
+</para>
+
+<para>
+It is not likely that this function will be called directly. It
+is used by DEBUG() and DEBUGADD().
+</para>
+</sect2>
+
+<sect2>
+<title>format_debug_text()</title>
+<para>
+This is a static function in debug.c. It stores the output text
+for the body of the message in a buffer until it encounters a
+newline. When the newline character is found, the buffer is
+written to the debug file via the Debug1() function, and the
+buffer is reset. This allows us to add the indentation at the
+beginning of each line of the message body, and also ensures
+that the output is written a line at a time (which cleans up
+syslog output).
+</para>
+</sect2>
+</sect1>
+</chapter>
diff --git a/docs/docbook/devdoc/dev-doc.sgml b/docs/docbook/devdoc/dev-doc.sgml
index f84c129f00f..76ad512add7 100644
--- a/docs/docbook/devdoc/dev-doc.sgml
+++ b/docs/docbook/devdoc/dev-doc.sgml
@@ -1,5 +1,11 @@
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
 <!ENTITY NetBIOS SYSTEM "NetBIOS.sgml">
+<!ENTITY Architecture SYSTEM "architecture.sgml">
+<!ENTITY debug SYSTEM "debug.sgml">
+<!ENTITY internals SYSTEM "internals.sgml">
+<!ENTITY parsing SYSTEM "parsing.sgml">
+<!ENTITY unix-smb SYSTEM "unix-smb.sgml">
+<!ENTITY CodingSuggestions SYSTEM "CodingSuggestions.sgml">
 ]>
 
 <book id="Samba-Developer-Documentation">
@@ -40,5 +46,11 @@ url="http://www.fsf.org/licenses/gpl.txt">http://www.fsf.org/licenses/gpl.txt</u
 
 <!-- Chapters -->
 &NetBIOS;
+&Architecture;
+&debug;
+&CodingSuggestions;
+&internals;
+&parsing;
+&unix-smb;
 
 </book>
diff --git a/docs/docbook/devdoc/internals.sgml b/docs/docbook/devdoc/internals.sgml
new file mode 100644
index 00000000000..79524347b63
--- /dev/null
+++ b/docs/docbook/devdoc/internals.sgml
@@ -0,0 +1,439 @@
+<chapter id="internals">
+<chapterinfo>
+	<author>
+		<firstname>David</firstname><surname>Chappell</surname>
+		<affiliation>
+			<address><email>David.Chappell@mail.trincoll.edu</email></address>
+		</affiliation>
+	</author>
+	<pubdate>8 May 1996</pubdate>
+</chapterinfo>
+
+<title>Samba Internals</title>
+
+<sect1>
+<title>Character Handling</title>
+<para>
+This section describes character set handling in Samba, as implemented in
+Samba 3.0 and above
+</para>
+
+<para>
+In the past Samba had very ad-hoc character set handling. Scattered
+throughout the code were numerous calls which converted particular
+strings to/from DOS codepages. The problem is that there was no way of
+telling if a particular char* is in dos codepage or unix
+codepage. This led to a nightmare of code that tried to cope with
+particular cases without handlingt the general case.
+</para>
+
+<sect1>
+<title>The new functions</title>
+
+<para>
+The new system works like this:
+</para>
+
+<orderedlist>
+<listitem><para>
+	all char* strings inside Samba are "unix" strings. These are
+	multi-byte strings that are in the charset defined by the "unix
+	charset" option in smb.conf. 
+</para></listitem>
+
+<listitem><para>
+	there is no single fixed character set for unix strings, but any
+	character set that is used does need the following properties:
+	</para>
+	<orderedlist>
+	
+	<listitem><para>
+		must not contain NULLs except for termination
+	</para></listitem>
+
+	<listitem><para>
+		must be 7-bit compatible with C strings, so that a constant
+		string or character in C will be byte-for-byte identical to the
+		equivalent string in the chosen character set. 
+	</para></listitem>
+	
+	<listitem><para>
+		when you uppercase or lowercase a string it does not become
+		longer than the original string
+	</para></listitem>
+
+	<listitem><para>
+		must be able to correctly hold all characters that your client
+		will throw at it
+	</para></listitem>
+	</orderedlist>
+	
+	<para>
+	For example, UTF-8 is fine, and most multi-byte asian character sets
+	are fine, but UCS2 could not be used for unix strings as they
+	contain nulls.
+	</para>
+</listitem>
+
+<listitem><para>
+	when you need to put a string into a buffer that will be sent on the
+	wire, or you need a string in a character set format that is
+	compatible with the clients character set then you need to use a
+	pull_ or push_ function. The pull_ functions pull a string from a
+	wire buffer into a (multi-byte) unix string. The push_ functions
+	push a string out to a wire buffer. 
+</para></listitem>
+
+<listitem><para>
+	the two main pull_ and push_ functions you need to understand are
+	pull_string and push_string. These functions take a base pointer
+	that should point at the start of the SMB packet that the string is
+	in. The functions will check the flags field in this packet to
+	automatically determine if the packet is marked as a unicode packet,
+	and they will choose whether to use unicode for this string based on
+	that flag. You may also force this decision using the STR_UNICODE or
+	STR_ASCII flags. For use in smbd/ and libsmb/ there are wrapper
+	functions clistr_ and srvstr_ that call the pull_/push_ functions
+	with the appropriate first argument.
+	</para>
+	
+	<para>
+	You may also call the pull_ascii/pull_ucs2 or push_ascii/push_ucs2
+	functions if you know that a particular string is ascii or
+	unicode. There are also a number of other convenience functions in
+	charcnv.c that call the pull_/push_ functions with particularly
+	common arguments, such as pull_ascii_pstring()
+	</para>
+</listitem>
+
+<listitem><para>
+	The biggest thing to remember is that internal (unix) strings in Samba
+	may now contain multi-byte characters. This means you cannot assume
+	that characters are always 1 byte long. Often this means that you will
+	have to convert strings to ucs2 and back again in order to do some
+	(seemingly) simple task. For examples of how to do this see functions
+	like strchr_m(). I know this is very slow, and we will eventually
+	speed it up but right now we want this stuff correct not fast.
+</para></listitem>
+
+<listitem><para>
+	all lp_ functions now return unix strings. The magic "DOS" flag on
+	parameters is gone.
+</para></listitem>
+
+<listitem><para>
+	all vfs functions take unix strings. Don't convert when passing to them
+</para></listitem>
+
+</orderedlist>
+
+</sect1>
+
+<sect1>
+<title>Macros in byteorder.h</title>
+
+<para>
+This section describes the macros defined in byteorder.h.  These macros 
+are used extensively in the Samba code.
+</para>
+
+<sect2>
+<title>CVAL(buf,pos)</title>
+
+<para>
+returns the byte at offset pos within buffer buf as an unsigned character.
+</para>
+</sect2>
+
+<sect2>
+<title>PVAL(buf,pos)</title>
+<para>returns the value of CVAL(buf,pos) cast to type unsigned integer.</para>
+</sect2>
+
+<sect2>
+<title>SCVAL(buf,pos,val)</title>
+<para>sets the byte at offset pos within buffer buf to value val.</para>
+</sect2>
+
+<sect2>
+<title>SVAL(buf,pos)</title>
+<para>
+	returns the value of the unsigned short (16 bit) little-endian integer at 
+	offset pos within buffer buf.  An integer of this type is sometimes
+	refered to as "USHORT".
+</para>
+</sect2>
+
+<sect2>
+<title>IVAL(buf,pos)</title>
+<para>returns the value of the unsigned 32 bit little-endian integer at offset 
+pos within buffer buf.</para>
+</sect2>
+
+<sect2>
+<title>SVALS(buf,pos)</title>
+<para>returns the value of the signed short (16 bit) little-endian integer at 
+offset pos within buffer buf.</para>
+</sect2>
+
+<sect2>
+<title>IVALS(buf,pos)</title>
+<para>returns the value of the signed 32 bit little-endian integer at offset pos
+within buffer buf.</para>
+</sect2>
+
+<sect2>
+<title>SSVAL(buf,pos,val)</title>
+<para>sets the unsigned short (16 bit) little-endian integer at offset pos within 
+buffer buf to value val.</para>
+</sect2>
+
+<sect2>
+<title>SIVAL(buf,pos,val)</title>
+<para>sets the unsigned 32 bit little-endian integer at offset pos within buffer 
+buf to the value val.</para>
+</sect2>
+
+<sect2>
+<title>SSVALS(buf,pos,val)</title>
+<para>sets the short (16 bit) signed little-endian integer at offset pos within 
+buffer buf to the value val.</para>
+</sect2>
+
+<sect2>
+<title>SIVALS(buf,pos,val)</title>
+<para>sets the signed 32 bit little-endian integer at offset pos withing buffer
+buf to the value val.</para>
+</sect2>
+
+<sect2>
+<title>RSVAL(buf,pos)</title>
+<para>returns the value of the unsigned short (16 bit) big-endian integer at 
+offset pos within buffer buf.</para>
+</sect2>
+
+<sect2>
+<title>RIVAL(buf,pos)</title>
+<para>returns the value of the unsigned 32 bit big-endian integer at offset 
+pos within buffer buf.</para>
+</sect2>
+
+<sect2>
+<title>RSSVAL(buf,pos,val)</title>
+<para>sets the value of the unsigned short (16 bit) big-endian integer at 
+offset pos within buffer buf to value val.
+refered to as "USHORT".</para>
+</sect2>
+
+<sect2>
+<title>RSIVAL(buf,pos,val)</title>
+<para>sets the value of the unsigned 32 bit big-endian integer at offset 
+pos within buffer buf to value val.</para>
+</sect2>
+
+</sect1>
+
+
+<sect1>
+<title>LAN Manager Samba API</title>
+
+<para>
+This section describes the functions need to make a LAN Manager RPC call.
+This information had been obtained by examining the Samba code and the LAN
+Manager 2.0 API documentation.  It should not be considered entirely
+reliable.
+</para>
+
+<para>
+<programlisting>
+call_api(int prcnt, int drcnt, int mprcnt, int mdrcnt, 
+	char *param, char *data, char **rparam, char **rdata);
+</programlisting>
+</para>
+
+<para>
+This function is defined in client.c.  It uses an SMB transaction to call a
+remote api.
+</para>
+
+<sect2>
+<title>Parameters</title>
+
+<para>The parameters are as follows:</para>
+
+<orderedlist>
+<listitem><para>
+	prcnt: the number of bytes of parameters begin sent.
+</para></listitem>
+<listitem><para>
+	drcnt:   the number of bytes of data begin sent.
+</para></listitem>
+<listitem><para>
+	mprcnt:  the maximum number of bytes of parameters which should be returned
+</para></listitem>
+<listitem><para>
+	mdrcnt:  the maximum number of bytes of data which should be returned
+</para></listitem>
+<listitem><para>
+	param:   a pointer to the parameters to be sent.
+</para></listitem>
+<listitem><para>
+	data:    a pointer to the data to be sent.
+</para></listitem>
+<listitem><para>
+	rparam:  a pointer to a pointer which will be set to point to the returned
+	paramters.  The caller of call_api() must deallocate this memory.
+</para></listitem>
+<listitem><para>
+	rdata:   a pointer to a pointer which will be set to point to the returned 
+	data.  The caller of call_api() must deallocate this memory.
+</para></listitem>
+</orderedlist>
+
+<para>
+These are the parameters which you ought to send, in the order of their
+appearance in the parameter block:
+</para>
+
+<orderedlist>
+
+<listitem><para>
+An unsigned 16 bit integer API number.  You should set this value with
+SSVAL().  I do not know where these numbers are described.
+</para></listitem>
+
+<listitem><para>
+An ASCIIZ string describing the parameters to the API function as defined
+in the LAN Manager documentation.  The first parameter, which is the server
+name, is ommited.  This string is based uppon the API function as described
+in the manual, not the data which is actually passed.
+</para></listitem>
+
+<listitem><para>
+An ASCIIZ string describing the data structure which ought to be returned.
+</para></listitem>
+
+<listitem><para>
+Any parameters which appear in the function call, as defined in the LAN
+Manager API documentation, after the "Server" and up to and including the
+"uLevel" parameters.
+</para></listitem>
+
+<listitem><para>
+An unsigned 16 bit integer which gives the size in bytes of the buffer we
+will use to receive the returned array of data structures.  Presumably this
+should be the same as mdrcnt.  This value should be set with SSVAL().
+</para></listitem>
+
+<listitem><para>
+An ASCIIZ string describing substructures which should be returned.  If no 
+substructures apply, this string is of zero length.
+</para></listitem>
+
+</orderedlist>
+
+<para>
+The code in client.c always calls call_api() with no data.  It is unclear
+when a non-zero length data buffer would be sent.
+</para>
+
+</sect2>
+
+<sect2>
+<title>Return value</title>
+
+<para>
+The returned parameters (pointed to by rparam), in their order of appearance
+are:</para>
+
+<orderedlist>
+
+<listitem><para>
+An unsigned 16 bit integer which contains the API function's return code. 
+This value should be read with SVAL().
+</para></listitem>
+
+<listitem><para>
+An adjustment which tells the amount by which pointers in the returned
+data should be adjusted.  This value should be read with SVAL().  Basically, 
+the address of the start of the returned data buffer should have the returned
+pointer value added to it and then have this value subtracted from it in
+order to obtain the currect offset into the returned data buffer.
+</para></listitem>
+
+<listitem><para>
+A count of the number of elements in the array of structures returned. 
+It is also possible that this may sometimes be the number of bytes returned.
+</para></listitem>
+</orderedlist>
+
+<para>
+When call_api() returns, rparam points to the returned parameters.  The
+first if these is the result code.  It will be zero if the API call
+suceeded.  This value by be read with "SVAL(rparam,0)".
+</para>
+
+<para>
+The second parameter may be read as "SVAL(rparam,2)".  It is a 16 bit offset
+which indicates what the base address of the returned data buffer was when
+it was built on the server.  It should be used to correct pointer before
+use.
+</para>
+
+<para>
+The returned data buffer contains the array of returned data structures. 
+Note that all pointers must be adjusted before use.  The function
+fix_char_ptr() in client.c can be used for this purpose.
+</para>
+
+<para>
+The third parameter (which may be read as "SVAL(rparam,4)") has something to
+do with indicating the amount of data returned or possibly the amount of
+data which can be returned if enough buffer space is allowed.
+</para>
+
+</sect2>
+</sect1>
+
+<sect1>
+<title>Code character table</title>
+<para>
+Certain data structures are described by means of ASCIIz strings containing
+code characters.  These are the code characters:
+</para>
+
+<orderedlist>
+<listitem><para>
+W	a type byte little-endian unsigned integer
+</para></listitem>
+<listitem><para>
+N	a count of substructures which follow
+</para></listitem>
+<listitem><para>
+D	a four byte little-endian unsigned integer
+</para></listitem>
+<listitem><para>
+B	a byte (with optional count expressed as trailing ASCII digits)
+</para></listitem>
+<listitem><para>
+z	a four byte offset to a NULL terminated string
+</para></listitem>
+<listitem><para>
+l	a four byte offset to non-string user data
+</para></listitem>
+<listitem><para>
+b	an offset to data (with count expressed as trailing ASCII digits)
+</para></listitem>
+<listitem><para>
+r	pointer to returned data buffer???
+</para></listitem>
+<listitem><para>
+L	length in bytes of returned data buffer???
+</para></listitem>
+<listitem><para>
+h	number of bytes of information available???
+</para></listitem>
+</orderedlist>
+
+</sect1>
+</chapter>
diff --git a/docs/docbook/devdoc/parsing.sgml b/docs/docbook/devdoc/parsing.sgml
new file mode 100644
index 00000000000..0121935d26d
--- /dev/null
+++ b/docs/docbook/devdoc/parsing.sgml
@@ -0,0 +1,239 @@
+<chapter id="parsing">
+<chapterinfo>
+	<author>
+		<firstname>Chris</firstname><surname>Hertel</surname>
+	</author>
+	<pubdate>November 1997</pubdate>
+</chapterinfo>
+
+<title>The smb.conf file</title>
+
+<sect1>
+<title>Lexical Analysis</title>
+
+<para>
+Basically, the file is processed on a line by line basis.  There are
+four types of lines that are recognized by the lexical analyzer
+(params.c):
+</para>
+
+<orderedlist>
+<listitem><para>
+Blank lines - Lines containing only whitespace.
+</para></listitem>
+<listitem><para>
+Comment lines - Lines beginning with either a semi-colon or a
+pound sign (';' or '#').
+</para></listitem>
+<listitem><para>
+Section header lines - Lines beginning with an open square bracket ('[').
+</para></listitem>
+<listitem><para>
+Parameter lines - Lines beginning with any other character.
+(The default line type.)
+</para></listitem>
+</orderedlist>
+
+<para>
+The first two are handled exclusively by the lexical analyzer, which
+ignores them.  The latter two line types are scanned for
+</para>
+
+<orderedlist>
+<listitem><para>
+  - Section names
+</para></listitem>
+<listitem><para>
+  - Parameter names
+</para></listitem>
+<listitem><para>
+  - Parameter values
+</para></listitem>
+</orderedlist>
+
+<para>
+These are the only tokens passed to the parameter loader
+(loadparm.c).  Parameter names and values are divided from one
+another by an equal sign: '='.
+</para>
+
+<sect2>
+<title>Handling of Whitespace</title>
+
+<para>
+Whitespace is defined as all characters recognized by the isspace()
+function (see ctype(3C)) except for the newline character ('\n')
+The newline is excluded because it identifies the end of the line.
+</para>
+
+<orderedlist>
+<listitem><para>
+The lexical analyzer scans past white space at the beginning of a line.
+</para></listitem>
+
+<listitem><para>
+Section and parameter names may contain internal white space.  All
+whitespace within a name is compressed to a single space character. 
+</para></listitem>
+
+<listitem><para>
+Internal whitespace within a parameter value is kept verbatim with 
+the exception of carriage return characters ('\r'), all of which
+are removed.
+</para></listitem>
+
+<listitem><para>
+Leading and trailing whitespace is removed from names and values.
+</para></listitem>
+
+</orderedlist>
+
+</sect2>
+
+<sect2>
+<title>Handling of Line Continuation</title>
+
+<para>
+Long section header and parameter lines may be extended across
+multiple lines by use of the backslash character ('\\').  Line
+continuation is ignored for blank and comment lines.
+</para>
+
+<para>
+If the last (non-whitespace) character within a section header or on
+a parameter line is a backslash, then the next line will be
+(logically) concatonated with the current line by the lexical
+analyzer.  For example:
+</para>
+
+<para><programlisting>
+	param name = parameter value string \
+	with line continuation.
+</programlisting></para>
+
+<para>Would be read as</para>
+
+<para><programlisting>
+    param name = parameter value string     with line continuation.
+</programlisting></para>
+
+<para>
+Note that there are five spaces following the word 'string',
+representing the one space between 'string' and '\\' in the top
+line, plus the four preceeding the word 'with' in the second line.
+(Yes, I'm counting the indentation.)
+</para>
+
+<para>
+Line continuation characters are ignored on blank lines and at the end
+of comments.  They are *only* recognized within section and parameter
+lines.
+</para>
+
+</sect2>
+
+<sect2>
+<title>Line Continuation Quirks</title>
+
+<para>Note the following example:</para>
+
+<para><programlisting>
+	param name = parameter value string \
+    \
+    with line continuation.
+</programlisting></para>
+
+<para>
+The middle line is *not* parsed as a blank line because it is first
+concatonated with the top line.  The result is
+</para>
+
+<para><programlisting>
+param name = parameter value string         with line continuation.
+</programlisting></para>
+
+<para>The same is true for comment lines.</para>
+
+<para><programlisting>
+	param name = parameter value string \
+	; comment \
+    with a comment.
+</programlisting></para>
+
+<para>This becomes:</para>
+
+<para><programlisting>
+param name = parameter value string     ; comment     with a comment.
+</programlisting></para>
+
+<para>
+On a section header line, the closing bracket (']') is considered a
+terminating character, and the rest of the line is ignored.  The lines
+</para>
+
+<para><programlisting>
+	[ section   name ] garbage \
+    param  name  = value
+</programlisting></para>
+
+<para>are read as</para>
+
+<para><programlisting>
+	[section name]
+    param name = value
+</programlisting></para>
+
+</sect2>
+</sect1>
+
+<sect1>
+<title>Syntax</title>
+
+<para>The syntax of the smb.conf file is as follows:</para>
+
+<para><programlisting>
+  &lt;file&gt;            :==  { &lt;section&gt; } EOF
+  &lt;section&gt;         :==  &lt;section header&gt; { &lt;parameter line&gt; }
+  &lt;section header&gt;  :==  '[' NAME ']'
+  &lt;parameter line&gt;  :==  NAME '=' VALUE NL
+</programlisting><para>
+
+<para>Basically, this means that</para>
+
+<orderedlist>
+<listitem><para>
+	a file is made up of zero or more sections, and is terminated by
+	an EOF (we knew that).
+</para></listitem>
+
+<listitem><para>
+	A section is made up of a section header followed by zero or more
+	parameter lines.
+</para></listitem>
+
+<listitem><para>
+	A section header is identified by an opening bracket and
+	terminated by the closing bracket.  The enclosed NAME identifies
+	the section.
+</para></listitem>
+
+<listitem><para>
+	A parameter line is divided into a NAME and a VALUE.  The *first*
+	equal sign on the line separates the NAME from the VALUE.  The
+	VALUE is terminated by a newline character (NL = '\n').
+</para></listitem>
+
+</orderedlist>
+
+<sect2>
+<title>About params.c</title>
+
+<para>
+The parsing of the config file is a bit unusual if you are used to
+lex, yacc, bison, etc.  Both lexical analysis (scanning) and parsing
+are performed by params.c.  Values are loaded via callbacks to
+loadparm.c.
+</para>
+</sect2>
+</sect1>
+</chapter>
diff --git a/docs/docbook/devdoc/unix-smb.sgml b/docs/docbook/devdoc/unix-smb.sgml
new file mode 100644
index 00000000000..be796988572
--- /dev/null
+++ b/docs/docbook/devdoc/unix-smb.sgml
@@ -0,0 +1,311 @@
+<chapter id="unix-smb">
+<chapterinfo>
+	<author>
+		<firstname>Andrew</firstname><surname>Tridgell</surname>
+	</author>
+	<pubdate>April 1995</pubdate>
+</chapterinfo>
+
+<title>NetBIOS in a Unix World</title>
+
+<sect1>
+<title>Introduction</title>
+<para>
+This is a short document that describes some of the issues that
+confront a SMB implementation on unix, and how Samba copes with
+them. They may help people who are looking at unix<->PC
+interoperability.
+</para>
+
+<para>
+It was written to help out a person who was writing a paper on unix to
+PC connectivity.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Usernames</title>
+<para>
+The SMB protocol has only a loose username concept. Early SMB
+protocols (such as CORE and COREPLUS) have no username concept at
+all. Even in later protocols clients often attempt operations
+(particularly printer operations) without first validating a username
+on the server.
+</para>
+
+<para>
+Unix security is based around username/password pairs. A unix box
+should not allow clients to do any substantive operation without some
+sort of validation. 
+</para>
+
+<para>
+The problem mostly manifests itself when the unix server is in "share
+level" security mode. This is the default mode as the alternative
+"user level" security mode usually forces a client to connect to the
+server as the same user for each connected share, which is
+inconvenient in many sites.
+</para>
+
+<para>
+In "share level" security the client normally gives a username in the
+"session setup" protocol, but does not supply an accompanying
+password. The client then connects to resources using the "tree
+connect" protocol, and supplies a password. The problem is that the
+user on the PC types the username and the password in different
+contexts, unaware that they need to go together to give access to the
+server. The username is normally the one the user typed in when they
+"logged onto" the PC (this assumes Windows for Workgroups). The
+password is the one they chose when connecting to the disk or printer.
+</para>
+
+<para>
+The user often chooses a totally different username for their login as
+for the drive connection. Often they also want to access different
+drives as different usernames. The unix server needs some way of
+divining the correct username to combine with each password.
+</para>
+
+<para>
+Samba tries to avoid this problem using several methods. These succeed
+in the vast majority of cases. The methods include username maps, the
+service%user syntax, the saving of session setup usernames for later
+validation and the derivation of the username from the service name
+(either directly or via the user= option).
+</para>
+
+</sect1>
+
+<sect1>
+<title>File Ownership</title>
+
+<para>
+The commonly used SMB protocols have no way of saying "you can't do
+that because you don't own the file". They have, in fact, no concept
+of file ownership at all.
+</para>
+
+<para>
+This brings up all sorts of interesting problems. For example, when
+you copy a file to a unix drive, and the file is world writeable but
+owned by another user the file will transfer correctly but will
+receive the wrong date. This is because the utime() call under unix
+only succeeds for the owner of the file, or root, even if the file is
+world writeable. For security reasons Samba does all file operations
+as the validated user, not root, so the utime() fails. This can stuff
+up shared development diectories as programs like "make" will not get
+file time comparisons right.
+</para>
+
+<para>
+There are several possible solutions to this problem, including
+username mapping, and forcing a specific username for particular
+shares.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Passwords</title>
+
+<para>
+Many SMB clients uppercase passwords before sending them. I have no
+idea why they do this. Interestingly WfWg uppercases the password only
+if the server is running a protocol greater than COREPLUS, so
+obviously it isn't just the data entry routines that are to blame.
+</para>
+
+<para>
+Unix passwords are case sensitive. So if users use mixed case
+passwords they are in trouble.
+</para>
+
+<para>
+Samba can try to cope with this by either using the "password level"
+option which causes Samba to try the offered password with up to the
+specified number of case changes, or by using the "password server"
+option which allows Samba to do its validation via another machine
+(typically a WinNT server).
+</para>
+
+<para>
+Samba supports the password encryption method used by SMB
+clients. Note that the use of password encryption in Microsoft
+networking leads to password hashes that are "plain text equivalent".
+This means that it is *VERY* important to ensure that the Samba
+smbpasswd file containing these password hashes is only readable
+by the root user. See the documentation ENCRYPTION.txt for more
+details.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Locking</title>
+<para>
+The locking calls available under a DOS/Windows environment are much
+richer than those available in unix. This means a unix server (like
+Samba) choosing to use the standard fcntl() based unix locking calls
+to implement SMB locking has to improvise a bit.
+</para>
+
+<para>
+One major problem is that dos locks can be in a 32 bit (unsigned)
+range. Unix locking calls are 32 bits, but are signed, giving only a 31
+bit range. Unfortunately OLE2 clients use the top bit to select a
+locking range used for OLE semaphores.
+</para>
+
+<para>
+To work around this problem Samba compresses the 32 bit range into 31
+bits by appropriate bit shifting. This seems to work but is not
+ideal. In a future version a separate SMB lockd may be added to cope
+with the problem.
+</para>
+
+<para>
+It also doesn't help that many unix lockd daemons are very buggy and
+crash at the slightest provocation. They normally go mostly unused in
+a unix environment because few unix programs use byte range
+locking. The stress of huge numbers of lock requests from dos/windows
+clients can kill the daemon on some systems.
+</para>
+
+<para>
+The second major problem is the "opportunistic locking" requested by
+some clients. If a client requests opportunistic locking then it is
+asking the server to notify it if anyone else tries to do something on
+the same file, at which time the client will say if it is willing to
+give up its lock. Unix has no simple way of implementing
+opportunistic locking, and currently Samba has no support for it.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Deny Modes</title>
+
+<para>
+When a SMB client opens a file it asks for a particular "deny mode" to
+be placed on the file. These modes (DENY_NONE, DENY_READ, DENY_WRITE,
+DENY_ALL, DENY_FCB and DENY_DOS) specify what actions should be
+allowed by anyone else who tries to use the file at the same time. If
+DENY_READ is placed on the file, for example, then any attempt to open
+the file for reading should fail.
+</para>
+
+<para>
+Unix has no equivalent notion. To implement this Samba uses either lock
+files based on the files inode and placed in a separate lock
+directory or a shared memory implementation. The lock file method 
+is clumsy and consumes processing and file resources,
+the shared memory implementation is vastly prefered and is turned on
+by default for those systems that support it.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Trapdoor UIDs</title>
+<para>
+A SMB session can run with several uids on the one socket. This
+happens when a user connects to two shares with different
+usernames. To cope with this the unix server needs to switch uids
+within the one process. On some unixes (such as SCO) this is not
+possible. This means that on those unixes the client is restricted to
+a single uid.
+</para>
+
+<para>
+Note that you can also get the "trapdoor uid" message for other
+reasons. Please see the FAQ for details.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Port numbers</title>
+<para>
+There is a convention that clients on sockets use high "unprivilaged"
+port numbers (>1000) and connect to servers on low "privilaged" port
+numbers. This is enforced in Unix as non-root users can't open a
+socket for listening on port numbers less than 1000.
+</para>
+
+<para>
+Most PC based SMB clients (such as WfWg and WinNT) don't follow this
+convention completely. The main culprit is the netbios nameserving on
+udp port 137. Name query requests come from a source port of 137. This
+is a problem when you combine it with the common firewalling technique
+of not allowing incoming packets on low port numbers. This means that
+these clients can't query a netbios nameserver on the other side of a
+low port based firewall.
+</para>
+
+<para>
+The problem is more severe with netbios node status queries. I've
+found that WfWg, Win95 and WinNT3.5 all respond to netbios node status
+queries on port 137 no matter what the source port was in the
+request. This works between machines that are both using port 137, but
+it means it's not possible for a unix user to do a node status request
+to any of these OSes unless they are running as root. The answer comes
+back, but it goes to port 137 which the unix user can't listen
+on. Interestingly WinNT3.1 got this right - it sends node status
+responses back to the source port in the request.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Protocol Complexity</title>
+<para>
+There are many "protocol levels" in the SMB protocol. It seems that
+each time new functionality was added to a Microsoft operating system,
+they added the equivalent functions in a new protocol level of the SMB
+protocol to "externalise" the new capabilities.
+</para>
+
+<para>
+This means the protocol is very "rich", offering many ways of doing
+each file operation. This means SMB servers need to be complex and
+large. It also means it is very difficult to make them bug free. It is
+not just Samba that suffers from this problem, other servers such as
+WinNT don't support every variation of every call and it has almost
+certainly been a headache for MS developers to support the myriad of
+SMB calls that are available.
+</para>
+
+<para>
+There are about 65 "top level" operations in the SMB protocol (things
+like SMBread and SMBwrite). Some of these include hundreds of
+sub-functions (SMBtrans has at least 120 sub-functions, like
+DosPrintQAdd and NetSessionEnum). All of them take several options
+that can change the way they work. Many take dozens of possible
+"information levels" that change the structures that need to be
+returned. Samba supports all but 2 of the "top level" functions. It
+supports only 8 (so far) of the SMBtrans sub-functions. Even NT
+doesn't support them all.
+</para>
+
+<para>
+Samba currently supports up to the "NT LM 0.12" protocol, which is the
+one preferred by Win95 and WinNT3.5. Luckily this protocol level has a
+"capabilities" field which specifies which super-duper new-fangled
+options the server suports. This helps to make the implementation of
+this protocol level much easier.
+</para>
+
+<para>
+There is also a problem with the SMB specications. SMB is a X/Open
+spec, but the X/Open book is far from ideal, and fails to cover many
+important issues, leaving much to the imagination. Microsoft recently
+renamed the SMB protocol CIFS (Common Internet File System) and have 
+published new specifications. These are far superior to the old 
+X/Open documents but there are still undocumented calls and features. 
+This specification is actively being worked on by a CIFS developers 
+mailing list hosted by Microsft.
+</para>
+</sect1>
+</chapter>
+
author	Jelmer Vernooij <jelmer@samba.org>	2002-08-29 13:28:17 +0000
committer	Jelmer Vernooij <jelmer@samba.org>	2002-08-29 13:28:17 +0000
commit	e05bdd9eab760b5dc6a4442dc89752080ff1d2c1 (patch)
tree	9190f5c1824adecca9813fc1521132d0f519c649 /docs
parent	4631e1fd1039c7eed58d3738bd7310219acc9000 (diff)
download	samba-e05bdd9eab760b5dc6a4442dc89752080ff1d2c1.tar.gz