From 05095851346f52c8e918176e8e2abdf0b21de5ec Mon Sep 17 00:00:00 2001 From: dwheeler Date: Fri, 7 Jul 2006 13:36:27 +0000 Subject: Initial import (sloccount 2.26) git-svn-id: svn://svn.code.sf.net/p/sloccount/code/trunk@1 d762cc98-fd17-0410-9a0d-d09172385bc5 --- sloccount.html.orig | 2440 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2440 insertions(+) create mode 100644 sloccount.html.orig (limited to 'sloccount.html.orig') diff --git a/sloccount.html.orig b/sloccount.html.orig new file mode 100644 index 0000000..dd0ad54 --- /dev/null +++ b/sloccount.html.orig @@ -0,0 +1,2440 @@ + + + +SLOCCount User's Guide + + +
+SLOCCount User's Guide +
+by David A. Wheeler (dwheeler, at, dwheeler.com) +
+December 2, 2002 +
+Version 2.20 +
+

+

Introduction

+

+SLOCCount (pronounced "sloc-count") is a suite of programs for counting +physical source lines of code (SLOC) in potentially large software systems. +Thus, SLOCCount is a "software metrics tool" or "software measurement tool". +SLOCCount was developed by David A. Wheeler, +originally to count SLOC in a GNU/Linux distribution, but it can be +used for counting the SLOC of arbitrary software systems. +

+SLOCCount is known to work on Linux systems, and has been tested +on Red Hat Linux versions 6.2, 7, and 7.1. +SLOCCount should run on many other Unix-like systems (if Perl is installed), +in particular, I would expect a *BSD system to work well. +Windows users can run sloccount by first installing +Cygwin. +SLOCCount is much slower on Windows/Cygwin, and it's not as easy to install +or use on Windows, but it works. +Of course, feel free to upgrade to an open source Unix-like system +(such as Linux or *BSD) instead :-). +

+SLOCCount can count physical SLOC for a wide number of languages. +Listed alphabetically, they are +Ada, Assembly (for many machines and assemblers), +awk (including gawk and nawk), +Bourne shell (and relatives such as bash, ksh, zsh, and pdksh), +C, C++, C# (also called C-sharp or cs), C shell (including tcsh), +COBOL, Expect, Fortran, Haskell, +Java, lex (including flex), +LISP (including Scheme), +makefiles (though they aren't usually shown in final reports), +Modula3, Objective-C, Pascal, Perl, PHP, Python, Ruby, sed, +SQL (normally not shown), +TCL, and Yacc. +It can gracefully handle awkward situations in many languages, +for example, it can determine the +syntax used in different assembly language files and adjust appropriately, +it knows about Python's use of string constants as comments, and it +can handle various Perl oddities (e.g., perlpods, here documents, +and Perl's _ _END_ _ marker). +It even has a "generic" SLOC counter that you may be able to use count the +SLOC of other languages (depending on the language's syntax). +

+SLOCCount can also take a large list of files and automatically categorize +them using a number of different heuristics. +The heuristics automatically determine if a file +is a source code file or not, and if so, which language it's written in. +For example, +it knows that ".pc" is usually a C source file for an Oracle preprocessor, +but it can detect many circumstances where it's actually a file about +a "PC" (personal computer). +For another example, it knows that ".m" is the standard extension for +Objective-C, but it will check the file contents to +see if really is Objective-C. +It will even examine file headers to attempt to accurately determine +the file's true type. +As a result, you can analyze large systems completely automatically. +

+Finally, SLOCCount has some report-generating tools +to collect the data generated, +and then present it in several different formats and sorted different ways. +The report-generating tool can also generate simple tab-separated files +so data can be passed on to other analysis tools (such as spreadsheets +and database systems). +

+SLOCCount will try to quickly estimate development time and effort given only +the lines of code it computes, using the original Basic COCOMO model. +This estimate can be improved if you can give more information about the project. +See the +discussion below about COCOMO, including intermediate COCOMO, +if you want to improve the estimates by giving additional information about +the project. +

+SLOCCount is open source software/free software (OSS/FS), +released under the GNU General Public License (GPL), version 2; +see the license below. +The master web site for SLOCCount is +http://www.dwheeler.com/sloccount. +You can learn a lot about SLOCCount by reading the paper that caused its +creation, available at +http://www.dwheeler.com/sloc. +Feel free to see my master web site at +http://www.dwheeler.com, which has +other material such as the +Secure Programming +for Linux and Unix HOWTO, +my list of +OSS/FS references, and my paper +Why OSS/FS? Look at +the Numbers! +Please send improvements by email +to dwheeler, at, dwheeler.com (DO NOT SEND SPAM - please remove the +commas, remove the spaces, and change the word "at" into the at symbol). +

+The following sections first give a "quick start" +(discussing how to use SLOCCount once it's installed), +discuss basic SLOCCount concepts, +how to install it, how to set your PATH, +how to install source code on RPM-based systems if you wish, and +more information on how to use the "sloccount" front-end. +This is followed by material for advanced users: +how to use SLOCCount tools individually (for when you want more control +than the "sloccount" tool gives you), designer's notes, +the definition of SLOC, and miscellaneous notes. +The last sections states the license used (GPL) and gives +hints on how to submit changes to SLOCCount (if you decide to make changes +to the program). + + +

+

Quick Start

+

+Once you've installed SLOCCount (discussed below), +you can measure an arbitrary program by typing everything +after the dollar sign into a terminal session: +

+  $  sloccount topmost-source-code-directory
+
+

+The directory listed and all its descendants will be examined. +You'll see output while it calculates, +culminating with physical SLOC totals and +estimates of development time, schedule, and cost. +If the directory contains a set of directories, each of which is +a different project developed independently, +use the "--multiproject" option so the effort estimations +can correctly take this into account. +

+You can redisplay the data different ways by using the "--cached" +option, which skips the calculation stage and re-prints previously +computed information. +You can use other options to control what's displayed: +"--filecount" shows counts of files instead of SLOC, and +"--details" shows the detailed information about every source code file. +So, to display all the details of every file once you've previously +calculated the results, just type: +

+  sloccount --cached --details
+
+

+You'll notice that the default output ends with a request. +If you use this data (e.g., in a report), please +credit that data as being "generated using 'SLOCCount' by David A. Wheeler." +I make no money from this program, so at least please give me some credit. +

+SLOCCount tries to ignore all automatically generated files, but its +heuristics to detect this are necessarily imperfect (after all, even humans +sometimes have trouble determining if a file was automatically genenerated). +If possible, try to clean out automatically generated files from +the source directories -- +in many situations "make clean" does this. +

+There's more to SLOCCount than this, but first we'll need to +explain some basic concepts, then we'll discuss other options +and advanced uses of SLOCCount. + +

+

Basic Concepts

+

+SLOCCount counts physical SLOC, also called "non-blank, non-comment lines". +More formally, physical SLOC is defined as follows: +``a physical source line of code (SLOC) is a line ending +in a newline or end-of-file marker, +and which contains at least one non-whitespace non-comment character.'' +Comment delimiters (characters other than newlines starting and ending +a comment) are considered comment characters. +Data lines only including whitespace +(e.g., lines with only tabs and spaces in multiline strings) are not included. +

+In SLOCCount, there are 3 different directories: +

    +
  1. The "source code directory", a directory containing the source code + being measured + (possibly in recursive subdirectories). The directories immediately + contained in the source code directory will normally be counted separately, + so it helps if your system is designed so that this top set of directories + roughly represents the system's major components. + If it doesn't, there are various tricks you can use to group source + code into components, but it's more work. + You don't need write access to the source code directory, but + you do need read access to all files, and read and search (execute) access + to all subdirectories. +
  2. The "bin directory", the directory containing the SLOCCount executables. + By default, installing the program creates a subdirectory + named "sloccount-VERSION" which is the bin directory. + The bin directory must be part of your PATH. +
  3. The "data directory", which stores the analysis results. + When measuring programs using "sloccount", by default + this is the directory ".slocdata" inside your home directory. + When you use the advanced SLOCCount tools directly, + in many cases this must be your "current" directory. + Inside the data directory are "data directory children" - these are + subdirectories that contain a file named "filelist", and each child + is used to represent a different project or a different + major component of a project. +
+

+SLOCCount can handle many different programming languages, and separate +them by type (so you can compare the use of each). +Here is the set of languages, sorted alphabetically; +common filename extensions are in +parentheses, with SLOCCount's ``standard name'' for the language +listed in brackets: +

    +
  1. Ada (.ada, .ads, .adb, .pad) [ada] +
  2. Assembly for many machines and assemblers (.s, .S, .asm) [asm] +
  3. awk (.awk) [awk] +
  4. Bourne shell and relatives such as bash, ksh, zsh, and pdksh (.sh) [sh] +
  5. C (.c, .pc, .ec, .ecp) [ansic] +
  6. C++ (.C, .cpp, .cxx, .cc, .pcc) [cpp] +
  7. C# (.cs) [cs] +
  8. C shell including tcsh (.csh) [csh] +
  9. COBOL (.cob, .cbl, .COB, .CBL) [cobol] +
  10. Expect (.exp) [exp] +
  11. Fortran (.f, .f77, .F) [fortran] +
  12. Haskell (.hs) [haskell]; please preprocess .lhs files. +
  13. Java (.java) [java] +
  14. lex (.l) [lex] +
  15. LISP including Scheme (.el, .scm, .lsp, .jl) [lisp] +
  16. makefiles (makefile) [makefile] +
  17. ML (.ml, .ml3) [ml] +
  18. Modula3 (.m3, .i3) [modula3] +
  19. Objective-C (.m) [objc] +
  20. Pascal (.p, .pas) [pascal] +
  21. Perl (.pl, .pm, .perl) [perl] +
  22. PHP (.php, .php[3456], .inc) [php] +
  23. Python (.py) [python] +
  24. Ruby (.rb) [ruby] +
  25. sed (.sed) [sed] +
  26. sql (.sql) [sql] +
  27. TCL (.tcl, .tk, .itk) [tcl] +
  28. Yacc (.y) [yacc] +
+ +

+

Installing SLOCCount

+

+Obviously, before using SLOCCount you'll need to install it. +SLOCCount depends on other programs, in particular perl, bash, +a C compiler (gcc will do), and md5sum +(you can get a useful md5sum program in the ``textutils'' package +on many Unix-like systems), so you'll need to get them installed +if they aren't already. +

+If your system uses RPM version 4 or greater to install software +(e.g., Red Hat Linux 7 or later), just download the SLOCCount RPM +and install it using a normal installation command; from the text line +you can use: +

+  rpm -Uvh sloccount*.rpm
+
+

+Everyone else will need to install from a tar file, and Windows users will +have to install Cygwin before installing sloccount. +

+If you're using Windows, you'll need to first install +Cygwin. +By installing Cygwin, you'll install an environment and a set of +open source Unix-like tools. +Cygwin essentially creates a Unix-like environment in which sloccount can run. +You may be able to run parts of sloccount without Cygwin, in particular, +the perl programs should run in the Windows port of Perl, but you're +on your own - many of the sloccount components expect a Unix-like environment. +If you want to install Cygwin, go to the +Cygwin main page +and install it. +If you're using Cygwin, install it to use Unix newlines, not +DOS newlines - DOS newlines will cause odd errors in SLOCCount +(and probably other programs, too). +I have only tested a "full" Cygwin installation, so I suggest installing +everything. +If you're short on disk space, at least install +binutils, bash, fileutils, findutils, +gcc, grep, gzip, make, man, perl, readline, +sed, sh-utils, tar, textutils, unzip, and zlib; +you should probably install vim as well, +and there may be other dependencies as well. +By default Cygwin will create a directory C:\cygwin\home\NAME, +and will set up the ability to run Unix programs +(which will think that the same directory is called /home/NAME). +Now double-click on the Cygwin icon, or select from the Start menu +the selection Programs / Cygnus Solutions / Cygwin Bash shell; +you'll see a terminal screen with a Unix-like interface. +Now follow the instructions (next) for tar file users. +

+If you're installing from the tar file, download the file +(into your home directory is fine). +Unpacking the file will create a subdirectory, so if you want the +unpacked subdirectory to go somewhere special, "cd" to where you +want it to go. +Most likely, your home directory is just fine. +Now gunzip and untar SLOCCount (the * replaces the version #) by typing +this at a terminal session: +

+  gunzip -c sloccount*.tar.gz | tar xvf -
+
+Replace "sloccount*.tar.gz" shown above +with the full path of the downloaded file, wherever that is. +You've now created the "bin directory", which is simply the +"sloccount-VERSION" subdirectory created by the tar command +(where VERSION is the version number). +

+Now you need to compile the few compiled programs in the "bin directory" so +SLOCCount will be ready to go. +First, cd into the newly-created bin directory, by typing: +

+  cd sloccount*
+
+

+You may then need to override some installation settings. +You can can do this by editing the supplied makefile, or alternatively, +by providing options to "make" whenever you run make. +The supplied makefile assumes your C compiler is named "gcc", which +is true for most Linux systems, *BSD systems, and Windows systems using Cygwin. +If this isn't true, you'll need to set +the "CC" variable to the correct value (e.g., "cc"). +You can also modify where the files are stored; this variable is +called PREFIX and its default is /usr/local +(older versions of sloccount defaulted to /usr). +

+If you're using Windows and Cygwin, you +must override one of the installation +settings, EXE_SUFFIX, for installation to work correctly. +One way to set this value is to edit the "makefile" file so that +the line beginning with "EXE_SUFFIX" reads as follows: +

+  EXE_SUFFIX=.exe
+
+If you're using Cygwin and you choose to modify the "makefile", you +can use any text editor on the Cygwin side, or you can use a +Windows text editor if it can read and write Unix-formatted text files. +Cygwin users are free to use vim, for example. +If you're installing into your home directory and using the default locations, +Windows text editors will see the makefile as file +C:\cygwin\home\NAME\sloccount-VERSION\makefile. +Note that the Windows "Notepad" application doesn't work well, because it's not +able to handle Unix text files correctly. +Since this can be quite a pain, Cygus users may instead decide to override +make the makefile values instead during installation. +

+Finally, compile the few compiled programs in it by typing "make": +

+  make
+
+If you didn't edit the makefile in the previous step, you +need to provide options to make invocations to set the correct values. +This is done by simply saying (after "make") the name of the variable, +an equal sign, and its correct value. +Thus, to compile the program on a Windows system using Cygus, you can +skip modifying the makefile file by typing this instead of just "make": +
+  make EXE_SUFFIX=.exe
+
+

+If you want, you can install sloccount for system-wide use without +using the RPM version. +Windows users using Cygwin should probably do this, particularly +if they chose a "local" installation. +To do this, first log in as root (Cygwin users don't need to do this +for local installation). +Edit the makefile to match your system's conventions, if necessary, +and then type "make install": +

+  make install
+
+If you need to set some make options, remember to do that here too. +If you use "make install", you can uninstall it later using +"make uninstall". +Installing sloccount for system-wide use is optional; +SLOCCount works without a system-wide installation. +However, if you don't install sloccount system-wide, you'll need to +set up your PATH variable; see the section on +setting your path. +

+A note for Cygwin users (and some others): some systems, including Cygwin, +don't set up the environment quite right and thus can't display the manual +pages as installed. +The problem is that they forget to search /usr/local/share/man for +manual pages. +If you want to read the installed manual pages, type this +into a Bourne-like shell: +

+  MANPATH=/usr/local/share/man:/usr/share/man:/usr/man
+  export MANPATH
+
+Or, if you use a C shell: +
+  setenv MANPATH "/usr/local/share/man:/usr/share/man:/usr/man"
+
+From then on, you'll be able to view the reference manual pages +by typing "man sloccount" (or by using whatever manual page display system +you prefer). +

+ +

+

Installing The Source Code To Measure

+

+Obviously, you must install the software source code you're counting, +so somehow you must create the "source directory" +with the source code to measure. +You must also make sure that permissions are set so the software can +read these directories and files. +

+For example, if you're trying to count the SLOC for an RPM-based Linux system, +install the software source code by doing the following as root +(which will place all source code into the source directory +/usr/src/redhat/BUILD): +

    +
  1. Install all source rpm's: +
    +    mount /mnt/cdrom
    +    cd /mnt/cdrom/SRPMS
    +    rpm -ivh *.src.rpm
    +
    +
  2. Remove RPM spec files you don't want to count: +
    +    cd ../SPECS
    +    (look in contents of spec files, removing what you don't want)
    +
    +
  3. build/prep all spec files: +
    +    rpm -bp *.spec
    +
    +
  4. Set permissions so the source files can be read by all: +
    +    chmod -R a+rX /usr/src/redhat/BUILD
    +
    +
+

+Here's an example of how to download source code from an +anonymous CVS server. +Let's say you want to examine the source code in GNOME's "gnome-core" +directory, as stored at the CVS server "anoncvs.gnome.org". +Here's how you'd do that: +

    +
  1. Set up site and login parameters: +
    +  export CVSROOT=':pserver:anonymous@anoncvs.gnome.org:/cvs/gnome'
    +
    +
  2. Log in: +
    +  cvs login
    +
    +
  3. Check out the software (copy it to your local directory), using +mild compression to save on bandwidth: +
    +  cvs -z3 checkout gnome-core
    +
    +
+

+Of course, if you have a non-anonymous account, you'd set CVSROOT +to reflect this. For example, to log in using the "pserver" +protocol as ACCOUNT_NAME, do: +

+  export CVSROOT=':pserver:ACCOUNT_NAME@cvs.gnome.org:/cvs/gnome'
+
+

+You may need root privileges to install the source code and to give +another user permission to read it, but please avoid running the +sloccount program as root. +Although I know of no specific reason this would be a problem, +running any program as root turns off helpful safeguards. +

+Although SLOCCount tries to detect (and ignore) many cases where +programs are automatically generated, these heuristics are necessarily +imperfect. +So, please don't run any programs that generate other programs - just +do enough to get the source code prepared for counting. +In general you shouldn't run "make" on the source code, and if you have, +consider running "make clean" or "make really_clean" on the source code first. +It often doesn't make any difference, but identifying those circumstances +is difficult. +

+SLOCCount will not automatically uncompress files that are +compressed/archive files (such as .zip, .tar, or .tgz files). +Often such files are just "left over" old versions or files +that you're already counting. +If you want to count the contents of compressed files, uncompress them first. +

+SLOCCount also doesn't delve into files using "literate programming" +techniques, in part because there are too many incompatible formats +that implement it. +Thus, run the tools to extract the code from the literate programming files +before running SLOCCount. +For example, if you have many literate Haskell files (.lhs), please +extract them. + + +

Setting your PATH

+Before you can run SLOCCount, you'll need to make sure +the SLOCCount "bin directory" is in your PATH. +If you've installed SLOCCount in a system-wide location +such as /usr/bin, then you needn't do more; the RPMs and "make install" +commands essentially do this. +

+Otherwise, in Bourne-shell variants, type: +

+    PATH="$PATH:the directory with SLOCCount's executable files"
+    export PATH
+
+Csh users should instead type: +
+    setenv PATH "$PATH:the directory with SLOCCount's executable files"
+
+ +

Using SLOCCount: The Basics

+ +Normal use of SLOCCount is very simple. +In a terminal window just type "sloccount", followed by a +list of the source code directories to count. +If you give it only a single directory, SLOCCount tries to be +a little clever and break the source code into +subdirectories for purposes of reporting: +
    +
  1. if directory has at least +two subdirectories, then those subdirectories will be used as the +breakdown (see the example below). +
  2. If the single directory contains files as well as directories +(or if you give sloccount some files as parameters), those files will +be assigned to the directory "top_dir" so you can tell them apart +from other directories. +
  3. If there's a subdirectory named "src", then that subdirectory is again +broken down, with all the further subdirectories prefixed with "src_". +So if directory "X" has a subdirectory "src", which contains subdirectory +"modules", the program will report a separate count from "src_modules". +
+In the terminology discussed above, each of these directories would become +"data directory children." +

+You can also give "sloccount" a list of directories, in which case the +report will be broken down by these directories +(make sure that the basenames of these directories differ). +SLOCCount normally considers all descendants of these directories, +though unless told otherwise it ignores symbolic links. +

+This is all easier to explain by example. +Let's say that we want to measure Apache 1.3.12 as installed using an RPM. +Once it's installed, we just type: +

+ sloccount /usr/src/redhat/BUILD/apache_1.3.12
+
+The output we'll see shows status reports while it analyzes things, +and then it prints out: + +
+SLOC	Directory	SLOC-by-Language (Sorted)
+24728   src_modules     ansic=24728
+19067   src_main        ansic=19067
+8011    src_lib         ansic=8011
+5501    src_os          ansic=5340,sh=106,cpp=55
+3886    src_support     ansic=2046,perl=1712,sh=128
+3823    src_top_dir     sh=3812,ansic=11
+3788    src_include     ansic=3788
+3469    src_regex       ansic=3407,sh=62
+2783    src_ap          ansic=2783
+1378    src_helpers     sh=1345,perl=23,ansic=10
+1304    top_dir         sh=1304
+104     htdocs          perl=104
+31      cgi-bin         sh=24,perl=7
+0       icons           (none)
+0       conf            (none)
+0       logs            (none)
+
+
+ansic:       69191 (88.85%)
+sh:           6781 (8.71%)
+perl:         1846 (2.37%)
+cpp:            55 (0.07%)
+
+
+Total Physical Source Lines of Code (SLOC)                   = 77873
+Estimated Development Effort in Person-Years (Person-Months) = 19.36 (232.36)
+ (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
+Estimated Schedule in Years (Months)                         = 1.65 (19.82)
+ (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
+Estimated Average Number of Developers  (Effort/Schedule)    = 11.72
+Total Estimated Cost to Develop                              = $ 2615760
+ (average salary = $56286/year, overhead = 2.4).
+
+Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
+
+

+Interpreting this should be straightward. +The Apache directory has several subdirectories, including "htdocs", "cgi-bin", +and "src". +The "src" directory has many subdirectories in it +("modules", "main", and so on). +Code files directly +contained in the main directory /usr/src/redhat/BUILD/apache_1.3.12 +is labelled "top_dir", while +code directly contained in the src subdirectory is labelled "src_top_dir". +Code in the "src/modules" directory is labelled "src_modules" here. +The output shows each major directory broken +out, sorted from largest to smallest. +Thus, the "src/modules" directory had the most code of the directories, +24728 physical SLOC, all of it in C. +The "src/helpers" directory had a mix of shell, perl, and C; note that +when multiple languages are shown, the list of languages in that child +is also sorted from largest to smallest. +

+Below the per-component set is a list of all languages used, +with their total SLOC shown, sorted from most to least. +After this is the total physical SLOC (77,873 physical SLOC in this case). +

+Next is an estimation of the effort and schedule (calendar time) +it would take to develop this code. +For effort, the units shown are person-years (with person-months +shown in parentheses); for schedule, total years are shown first +(with months in parentheses). +When invoked through "sloccount", the default assumption is that all code is +part of a single program; the "--multiproject" option changes this +to assume that all top-level components are independently developed +programs. +When "--multiproject" is invoked, each project's efforts are estimated +separately (and then summed), and the schedule estimate presented +is the largest estimated schedule of any single component. +

+By default the "Basic COCOMO" model is used for estimating +effort and schedule; this model +includes design, code, test, and documentation time (both +user/admin documentation and development documentation). +See below for more information on COCOMO +as it's used in this program. +

+Next are several numbers that attempt to estimate what it would have cost +to develop this program. +This is simply the amount of effort, multiplied by the average annual +salary and by the "overhead multiplier". +The default annual salary is +$56,286 per year; this value was from the +ComputerWorld, September 4, 2000's Salary Survey +of an average U.S. programmer/analyst salary in the year 2000. +You might consider using other numbers +(ComputerWorld's September 3, 2001 Salary Survey found +an average U.S. programmer/analyst salary making $55,100, senior +systems programmers averaging $68,900, and senior systems analysts averaging +$72,300). + +

+Overhead is much harder to estimate; I did not find a definitive source +for information on overheads. +After informal discussions with several cost analysts, +I determined that an overhead of 2.4 +would be representative of the overhead sustained by +a typical software development company. +As discussed in the next section, you can change these numbers too. + +

+You may be surprised by the high cost estimates, but remember, +these include design, coding, testing, documentation (both for users +and for programmers), and a wrap rate for corporate overhead +(to cover facilities, equipment, accounting, and so on). +Many programmers forget these other costs and are shocked by the high figures. +If you only wanted to know the costs of the coding, you'd need to get +those figures. + + +

+Note that if any top-level directory has a file named PROGRAM_LICENSE, +that file is assumed to contain the name of the license +(e.g., "GPL", "LGPL", "MIT", "BSD", "MPL", and so on). +If there is at least one such file, sloccount will also report statistics +on licenses. + + +

Options

+The program "sloccount" has a large number of options +so you can control what is selected for counting and how the +results are displayed. +

+There are several options that control which files are selected +for counting: +

+ --duplicates   Count all duplicate files as normal files
+ --crossdups    Count duplicate files if they're in different data directory
+                children.
+ --autogen      Count automatically generated files
+ --follow       Follow symbolic links (normally they're ignored)
+ --addlang      Add languages to be counted that normally aren't shown.
+ --append       Add more files to the data directory
+
+Normally, files which have exactly the same content are counted only once +(data directory children are counted alphabetically, so the child +"first" in the alphabet will be considered the owner of the master copy). +If you want them all counted, use "--duplicates". +Sometimes when you use sloccount, each directory represents a different +project, in which case you might want to specify "--crossdups". +The program tries to reject files that are automatically generated +(e.g., a C file generated by bison), but you can disable this as well. +You can use "--addlang" to show makefiles and SQL files, which aren't +usually counted. +

+Possibly the most important option is "--cached". +Normally, when sloccount runs, it computes a lot of information and +stores this data in a "data directory" (by default, "~/.slocdata"). +The "--cached" option tells sloccount to use data previously computed, +greatly speeding up use once you've done the computation once. +The "--cached" option can't be used along with the options used to +select what files should be counted. +You can also select a different data directory by using the +"--datadir" option. +

+There are many options for controlling the output: +

+ --filecount     Show counts of files instead of SLOC.
+ --details       Present details: present one line per source code file.
+ --wide          Show "wide" format.  Ignored if "--details" selected
+ --multiproject  Assume each directory is for a different project
+                 (this modifies the effort estimation calculations)
+ --effort F E    Change the effort estimation model, so that it uses
+                 F as the factor and E as the exponent.
+ --schedule F E  Change the schedule estimation model, so that it uses
+                 F as the factor and E as the exponent.
+ --personcost P  Change the average annual salary to P.
+ --overhead O    Change the annual overhead to O.
+ --              End of options
+
+

+Basically, the first time you use sloccount, if you're measuring +a set of projects (not a single project) you might consider +using "--crossdups" instead of the defaults. +Then, you can redisplay data quickly by using "--cached", +combining it with options such as "--filecount". +If you want to send the data to another tool, use "--details". +

+If you're measuring a set of projects, you probably ought to pass +the option "--multiproject". +When "--multiproject" is used, efforts are computed for each component +separately and summed, and the time estimate used is the maximum +single estimated time. +

+The "--details" option dumps the available data in 4 columns, +tab-separated, where each line +represents a source code file in the data directory children identified. +The first column is the SLOC, the second column is the language type, +the third column is the name of the data directory child +(as it was given to get_sloc_details), +and the last column is the absolute pathname of the source code file. +You can then pipe this output to "sort" or some other tool for further +analysis (such as a spreadsheet or RDBMS). +

+You can change the parameters used to estimate effort using "--effort". +For example, if you believe that in the environment being used +you can produce 2 KSLOC/month scaling linearly, then +that means that the factor for effort you should use is 1/2 = 0.5 month/KSLOC, +and the exponent for effort is 1 (linear). +Thus, you can use "--effort 0.5 1". +

+You can also set the annual salary and overheads used to compute +estimated development cost. +While "$" is shown, there's no reason you have to use dollars; +the unit of development cost is the same unit as the unit used for +"--personcost". + +

More about COCOMO

+ +

+By default SLOCCount uses a very simple estimating model for effort and schedule: +the basic COCOMO model in the "organic" mode (modes are more fully discussed below). +This model estimates effort and schedule, including design, code, test, +and documentation time (both user/admin documentation and development documentation). +Basic COCOMO is a nice simple model, and it's used as the default because +it doesn't require any information about the code other than the SLOC count +already computed. +

+However, basic COCOMO's accuracy is limited for the same reason - +basic COCOMO doesn't take a number of important factors into account. +If you have the necessary information, you can improve the model's accuracy +by taking these factors into account. You can at least quickly determine +if the right "mode" is being used to improve accuracy. You can also +use the "Intermediate COCOMO" and "Detailed COCOMO" models that take more +factors into account, and are likely to produce more accurate estimates as +a result. Take these estimates as just that - estimates - they're not grand truths. +If you have the necessary information, +you can improve the model's accuracy by taking these factors into account, and +pass this additional information to sloccount using its +"--effort" and "--schedule" options (as discussed in +options). +

+To use the COCOMO model, you first need to determine if your application's +mode, which can be "Organic", "embedded", or "semidetached". +Most software is "organic" (which is why it's the default). +Here are simple definitions of these modes: +

+By default, SLOCCount uses the basic COCOMO model in the organic mode. +For the basic COCOMO model, here are the critical factors for --effort and --schedule:
+ +Thus, if you want to use SLOCCount but the project is actually semidetached, +you can use the options "--effort 3.0 1.12 --schedule 2.5 0.35" +to get a more accurate estimate. +
+For more accurate estimates, you can use the intermediate COCOMO models. +For intermediate COCOMO, use the following figures:
+ +The intermediate COCOMO values for schedule are exactly the same as the basic +COCOMO model; the starting effort values are not quite the same, as noted +in Boehm's book. However, in the intermediate COCOMO model, you don't +normally use the effort factors as-is, you use various corrective factors +(called cost drivers). To use these corrections, you consider +all the cost drivers, determine what best describes them, +and multiply their corrective values by the effort base factor. +The result is the final effort factor. +Here are the cost drivers (from Boehm's book, table 8-2 and 8-3): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Cost Drivers + Ratings +
ID + Driver Name + Very Low + Low + Nominal + High + Very High + Extra High +
RELY + Required software reliability + 0.75 (effect is slight inconvenience) + 0.88 (easily recovered losses) + 1.00 (recoverable losses) + 1.15 (high financial loss) + 1.40 (risk to human life) +   +
DATA + Database size +   + 0.94 (database bytes/SLOC < 10) + 1.00 (D/S between 10 and 100) + 1.08 (D/S between 100 and 1000) + 1.16 (D/S > 1000) +   +
CPLX + Product complexity + 0.70 (mostly straightline code, simple arrays, simple expressions) + 0.85 + 1.00 + 1.15 + 1.30 + 1.65 (microcode, multiple resource scheduling, device timing dependent coding) +
TIME + Execution time constraint +   +   + 1.00 (<50% use of available execution time) + 1.11 (70% use) + 1.30 (85% use) + 1.66 (95% use) +
STOR + Main storage constraint +   +   + 1.00 (<50% use of available storage)1.06 (70% use) + 1.21 (85% use) + 1.56 (95% use) +
VIRT + Virtual machine (HW and OS) volatility +   + 0.87 (major change every 12 months, minor every month) + 1.00 (major change every 6 months, minor every 2 weeks)1.15 (major change every 2 months, minor changes every week) + 1.30 (major changes every 2 weeks, minor changes every 2 days) +   +
TURN + Computer turnaround time +   + 0.87 (interactive) + 1.00 (average turnaround < 4 hours) + 1.07 + 1.15 +   +
ACAP + Analyst capability + 1.46 (15th percentile) + 1.19 (35th percentile) + 1.00 (55th percentile) + 0.86 (75th percentile) + 0.71 (90th percentile) +   +
AEXP + Applications experience + 1.29 (<= 4 months experience) + 1.13 (1 year) + 1.00 (3 years) + 0.91 (6 years) + 0.82 (12 years) +   +
PCAP + Programmer capability + 1.42 (15th percentile) + 1.17 (35th percentile) + 1.00 (55th percentile) + 0.86 (75th percentile) + 0.70 (90th percentile) +   +
VEXP + Virtual machine experience + 1.21 (<= 1 month experience) + 1.10 (4 months) + 1.00 (1 year) + 0.90 (3 years) +   +   +
LEXP + Programming language experience + 1.14 (<= 1 month experience) + 1.07 (4 months) + 1.00 (1 year) + 0.95 (3 years) +   +   +
MODP + Use of "modern" programming practices (e.g. structured programming) + 1.24 (No use) + 1.10 + 1.00 (some use) + 0.91 + 0.82 (routine use) +   +
TOOL + Use of software tools + 1.24 + 1.10 + 1.00 (basic tools) + 0.91 (test tools) + 0.83 (requirements, design, management, documentation tools) +   +
SCED + Required development schedule + 1.23 (75% of nominal) + 1.08 (85% of nominal) + 1.00 (nominal) + 1.04 (130% of nominal) + 1.10 (160% of nominal) +   +
+
+
+
+So, once all of the factors have been multiplied together, you can +then use the "--effort" flag to set more accurate factors and exponents. +Note that some factors will probably not be "nominal" simply because +times have changed since COCOMO was originally developed, so a few regions +that were desirable have become more common today. +For example, +for many software projects of today, virtual machine volatility tends to +be low, and the +use of "modern" programming practices (structured programming, +object-oriented programming, abstract data types, etc.) tends to be high. +COCOMO automatically handles these differences. +

+For example, imagine that you're examining a fairly simple application that +meets the "organic" requirements. Organic projects have a base factor +of 2.3 and exponents of 1.05, as noted above. +We then examine all the factors to determine a corrected base factor. +For this example, imagine +that we determine the values of these cost drivers are as follows:
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Cost Drivers
+
Ratings
+
ID
+
Driver Name
+
Rating
+
Multiplier
+
RELY
+
Required software reliability
+
Low - easily recovered losses
+
0.88
+
DATA
+
Database size
+
Low
+
0.94
+
CPLX
+
Product complexity
+
Nominal
+
1.00
+
TIME
+
Execution time constraint
+
Nominal
+
1.00
+
STOR
+
Main storage constraint
+
Nominal
+
1.00
+
VIRT
+
Virtual machine (HW and OS) volatility
+
Low (major change every 12 months, minor every month)
+
0.87
+
TURN
+
Computer turnaround time
+
Nominal
+
1.00
+
ACAP
+
Analyst capability
+
Nominal (55th percentile)
+
1.00
+
AEXP
+
Applications experience
+
Nominal (3 years)
+
1.00
+
PCAP
+
Programmer capability
+
Nominal (55th percentile)
+
1.00
+
VEXP
+
Virtual machine experience
+
High (3 years)
+
0.90
+
LEXP
+
Programming language experience
+
High (3 years)
+
0.95
+
MODP
+
Use of "modern" programming practices (e.g. structured programming)
+
High (Routine use)
+
0.82
+
TOOL
+
Use of software tools
+
Nominal (basic tools)
+
1.00
+
SCED
+
Required development schedule
+
Nominal
+
1.00
+
+

+So, starting with the base factor (2.3 in this case), and then multiplying +the driver values, we'll compute a final factor of: +By multiplying these driver values together in this example, we compute:
+

2.3*0.88*0.94*1*1*1*0.87*1.00*1*1*1*0.90*0.95*0.82*1*1
+For this +example, the final factor for the effort calculation is 1.1605. You would then +invoke sloccount with "--effort 1.1605 1.05" to pass in the corrected factor +and exponent for the effort estimation. +You don't need to use "--schedule" to set the factors when you're using +organic model, because in SLOCCount +the default values are the values for the organic model. +You can set scheduling parameters manually +anyway by setting "--schedule 2.5 0.38". +You do need to use the --schedule option for +embedded and semidetached projects, because those modes have different +schedule parameters. The final command would be:
+
+sloccount --effort 1.1605 1.05 --schedule 2.5 0.38 my_project
+

+The detailed COCOMO model requires breaking information down further. +

+For more information about the original COCOMO model, including the detailed +COCOMO model, see the book +Software Engineering Economics by Barry Boehm. +

+You may be surprised by the high cost estimates, but remember, +these include design, coding, testing (including +integration and testing), documentation (both for users +and for programmers), and a wrap rate for corporate overhead +(to cover facilities, equipment, accounting, and so on). +Many programmers forget these other costs and are shocked by the high cost +estimates. +

+If you want to know a subset of this cost, you'll need to isolate +just those figures that you're trying to measure. +For example, let's say you want to find the money a programmer would receive +to do just the coding of the units of the program +(ignoring wrap rate, design, testing, integration, and so on). +According to Boehm's book (page 65, table 5-2), +the percentage varies by product size. +For effort, code and unit test takes 42% for small (2 KSLOC), 40% for +intermediate (8 KSLOC), 38% for medium (32 KSLOC), and 36% for large +(128 KSLOC). +Sadly, Boehm doesn't separate coding from unit test; perhaps +50% of the time is spent in unit test in traditional proprietary +development (including fixing bugs found from unit test). +If you want to know the income to the programmer (instead of cost to +the company), you'll also want to remove the wrap rate. +Thus, a programmer's income to only write the code for a +small program (circa 2 KSLOC) would be 8.75% (42% x 50% x (1/2.4)) +of the default figure computed by SLOCCount. +

+In other words, less than one-tenth of the cost as computed by SLOCCount +is what actually would be made by a programmer for a small program for +just the coding task. +Note that a proprietary commercial company that bid using +this lower figure would rapidly go out of business, since this figure +ignores the many other costs they have to incur to actually develop +working products. +Programs don't arrive out of thin air; someone needs to determine what +the requirements are, how to design it, and perform at least +some testing of it. +

+There's another later estimation model for effort and schedule +called "COCOMO II", but COCOMO II requires logical SLOC instead +of physical SLOC. +SLOCCount doesn't currently measure logical SLOC, so +SLOCCount doesn't currently use COCOMO II. +Contributions of code to compute logical SLOC and then optionally +use COCOMO II will be gratefully accepted. + +

Counting Specific Files

+

+If you want to count a specific subset, you can use the "--details" +option to list individual files, pipe this into "grep" to select the +files you're interested in, and pipe the result to +my tool "print_sum" (which reads lines beginning with numbers, and +returns the total of those numbers). +If you've already done the analysis, an example would be: +

+  sloccount --cached --details | grep "/some/subdirectory/" | print_sum
+
+

+If you just want to count specific files, and you know what language +they're in, you +can just invoke the basic SLOC counters directly. +By convention the simple counters are named "LANGUAGE_count", +and they take on the command line a list of the +source files to count. +Here are some examples: +

+  c_count *.c *.cpp *.h  # Count C and C++ in current directory.
+  asm_count *.S          # Count assembly.
+
+All the counters (*_count) program accept a "-f FILENAME" option, where FILENAME +is a file containing the names of all the source files to count +(one file per text line). If FILENAME is "-", the + list of file names is taken from the standard input. +The "c_count" program handles both C and C++ (but not objective-C; +for that use objc_count). +The available counters are +ada_count, +asm_count, +awk_count, +c_count, +csh_count, +exp_count, +fortran_count, +java_count, +lex_count, +lisp_count, +ml_count, +modula3_count, +objc_count, +pascal_count, +perl_count, +python_count, +sed_count, +sh_count, +sql_count, and +tcl_count. +

+There is also "generic_count", which takes as its first parameter +the ``comment string'', followed by a list of files. +The comment string begins a comment that ends at the end of the line. +Sometimes, if you have source for a language not listed, generic_count +will be sufficient. +

+The basic SLOC counters will send output to standard out, one line per file +(showing the SLOC count and filename). +The assembly counter shows some additional information about each file. +The basic SLOC counters always complete their output with a line +saying "Total:", followe by a line with the +total SLOC count. + +

Countering Problems and Handling Errors

+ +If you're analyzing unfamiliar code, there's always the possibility +that it uses languages not processed by SLOCCount. +To counter this, after running SLOCCount, run the following program: +
+ count_unknown_ext
+
+This will look at the resulting data (in its default data directory +location, ~/.slocdata) and report a sorted list of the file extensions +for uncategorized ("unknown") files. +The list will show every file extension and how many files had that +extension, and is sorted by most common first. +It's not a problem if an "unknown" type isn't a source code file, but +if there are a significant number of source files in this category, +you'll need to change SLOCCount to get an accurate result. + +

+One error report that you may see is: +

+  c_count ERROR - terminated in string in (filename)
+
+ +The cause of this is that c_count (the counter for C-like languages) +keeps track of whether or not it's in a string, and when the counter +reached the end of the file, it still thought it was in a string. + +

+Note that c_count really does have to keep track of whether or +not it's a string. +For example, this is three lines of code, not two, because the +``comment'' is actually in string data: + +

+ a = "hello
+ /* this is not a comment */
+ bye";
+
+

+Usually this error means you have code that won't compile +given certain #define settings. E.G., XFree86 has a line of code that's +actually wrong (it has a string that's not terminated), but people +don't notice because the #define to enable it is not usually set. +Legitimate code can trigger this message, but code that triggers +this message is horrendously formatted and is begging for problems. + +

+In either case, the best way to handle the situation +is to modify the source code (slightly) so that the code's intent is clear +(by making sure that double-quotes balance). +If it's your own code, you definitely should fix this anyway. +You need to look at the double-quote (") characters. One approach is to +just grep for double-quote, and look at every line for text that isn't +terminated, e.g., printf("hello %s, myname); + +

+SLOCcount reports warnings when an unusually +large number of duplicate files are reported. +A large number of duplicates may suggest that you're counting +two different versions of the same program as though they were +independently developed. +You may want to cd into the data directory (usually ~/.slocdata), cd into +the child directories corresponding to each component, and then look +at their dup_list.dat files, which list the filenames that appeared +to be duplicated (and what they duplicate with). + + +

Adding Support for New Languages

+SLOCcount handles many languages, but if it doesn't support one you need, +you'll need to give the language a standard (lowercase ASCII) name, +then modify SLOCcount to (1) detect and (2) count code in that language. + +
    +
  1. +To detect a new language, you'll need to modify the program break_filelist. +If the filename extension is reliable, you can modify the array +%file_extensions, which maps various filename extensions into languages. +If your needs are more complex, you'll need to modify the code +(typically in functions get_file_type or file_type_from_contents) +so that the correct file type is determined. +For example, if a file with a given filename extension is only +sometimes that type, you'll need to write code to examine the +file contents. +
  2. +You'll need to create a SLOC counter for that language type. +It must have the name XYZ_count, where XYZ is the standard name for the +language. +

    +For some languages, you may be able to use the ``generic_count'' program +to implement your counter - generic_count takes as its first argument +the pattern which +identifies comment begins (which continue until the end of the line); +the other arguments are the files to count. +Thus, the LISP counter looks like this: +

    + #!/bin/sh
    + generic_count ';' $@
    +
    +The generic_count program won't work correctly if there are multiline comments +(e.g., C) or multiline string constants. +If your language is identical to C/C++'s syntax in terms of +string constant definitions and commenting syntax +(using // or /* .. */), then you can use the c_count program - in this case, +modify compute_sloc_lang so that the c_count program is used. +

    +Otherwise, you'll have to devise your own counting program. +The program must generate files with the same format, e.g., +for every filename passed as an argument, it needs to return separate lines, +where each line presents the SLOC +for that file, a space, and the filename. +(Note: the assembly language counter produces a slightly different format.) +After that, print "Total:" on its own line, and the actual SLOC total +on the following (last) line. +

+ +

Advanced SLOCCount Use

+For most people, the previous information is enough. +However, if you're measuring a large set of programs, or have unusual needs, +those steps may not give you enough control. +In that case, you may need to create your own "data directory" +by hand and separately run the SLOCCount tools. +Basically, "sloccount" (note the lower case) is the name for +a high-level tool which invokes many other tools; this entire +suite is named SLOCCount (note the mixed case). +The next section will describe how to invoke the various tools "manually" +so you can gain explicit control over the measuring process when +the defaults are not to your liking, along with various suggestions +for how to handle truly huge sets of data. +

+Here's how to manually create a "data directory" to hold +intermediate results, and how to invoke each tool in sequence +(with discussion of options): +

    +
  1. Set your PATH to include the SLOCCount "bin directory", as discussed above. +
  2. Make an empty "data directory" +(where all intermediate results will be stored); +you can pick any name and location you like for this directory. +Here, I'll use the name "data": +
    +    mkdir ~/data
    +
    +
  3. Change your current directory to this "data directory": +
    +    cd ~/data
    +
    +The rest of these instructions assume that your current directory +is the data directory. +You can set up many different data directories if you wish, to analyze +different source programs or analyze the programs in different ways; +just "cd" to the one you want to work with. +
  4. (Optional) Some of the later steps will produce +a lot of output while they're running. +If you want to capture this information into a file, use the standard +"script" command do to so. +For example, "script run1" will save the output of everything you do into +file "run1" (until you type control-D to stop saving the information). +Don't forget that you're creating such a file, or it will become VERY large, +and in particular don't type any passwords into such a session. +You can store the script in the data directory, or create a subdirectory +for such results - any data directory subdirectory that doesn't have the +special file "filelist" is not a "data directory child" and is thus +ignored by the later SLOCCount analysis routines. +
  5. Now initialize the "data directory". + In particular, initialization will create the "data directory children", + a set of subdirectories equivalent to the source code directory's + top directories. Each of these data directory children (subdirectories) + will contain a file named "filelist", which + lists all filenames in the corresponding source code directory. + These data directory children + will also eventually contain intermediate results + of analysis, which you can check for validity + (also, having a cache of these values speeds later analysis steps). +

    + You use the "make_filelists" command to initialize a data directory. + For example, if your source code is in /usr/src/redhat/BUILD, run: +

    +   make_filelists /usr/src/redhat/BUILD/*
    +
    +

    + Internally, make_filelists uses "find" to create the list of files, and + by default it ignores all symbolic links. However, you may need to + follow symbolic links; if you do, give make_filelists the + "--follow" option (which will use find's "-follow" option). + Here are make_filelists' options: +

    + --follow         Follow symbolic links
    + --datadir D      Use this data directory
    + --skip S         Skip basenames named S
    + --prefix P       When creating children, prepend P to their name.
    + --               No more options
    +
    +

    + Although you don't normally need to do so, if you want certain files to + not be counted at all in your analysis, you can remove + data directory children or edit the "filelist" files to do so. + There's no need to remove files which aren't source code files normally; + this is handled automatically by the next step. +

    + If you don't have a single source code directory where the subdirectories + represent the major components you want to count separately, you can + still use the tool but it's more work. + One solution is to create a "shadow" directory with the structure + you wish the program had, using symbolic links (you must use "--follow" + for this to work). + You can also just invoke make_filelists multiple times, with parameters + listing the various top-level directories you wish to include. + Note that the basenames of the directories must be unique. +

    + If there are so many directories (e.g., a massive number of projects) + that the command line is too long, + you can run make_filelists multiple times in the same + directory with different arguments to create them. + You may find "find" and/or "xargs" helpful in doing this automatically. + For example, here's how to do the same thing using "find": +

    + find /usr/src/redhat/BUILD -maxdepth 1 -mindepth 1 -type d \
    +        -exec make_filelists {} \;
    +
    +
  6. Categorize each file. +This means that we must determine which +files contain source code (eliminating auto-generated and duplicate files), +and of those files which language each file contains. +The result will be a set of files in each subdirectory of the data directory, +where each file represents a category (e.g., a language). +
    +   break_filelist *
    +
    + At this point you might want to examine the data directory subdirectories + to ensure that "break_filelist" has correctly determined the types of + the various files. + In particular, the "unknown" category may have source files in a language + SLOCCount doesn't know about. + If the heuristics got some categorization wrong, you can modify the + break_filelist program and re-run break_filelist. +

    + By default break_filelist removes duplicates, doesn't count + automatically generated files as normal source code files, and + only gives some feedback. You can change these defaults with the + following options: +

    + --duplicates   Count all duplicate files as normal files
    + --crossdups    Count duplicate files if they're in different data directory
    +                children (i.e., in different "filelists")
    + --autogen      Count automatically generated files
    + --verbose      Present more verbose status information while processing.
    +
    +

    + Duplicate control in particular is an issue; you probably don't want + duplicates counted, so that's the default. + Duplicate files are detected by determining if their MD5 checksums + are identical; the "first" duplicate encountered is the only one kept. + Normally, since shells sort directory names, this means that the + file in the alphabetically first child directory is the one counted. + You can change this around by listing directories in the sort order you + wish followed by "*"; if the same data directory child + is requested for analysis more + than once in a given execution, it's skipped after the first time. + So, if you want any duplicate files with child directory "glibc" to + count as part of "glibc", then you should provide the data directory children + list as "glibc *". +

    + Beware of choosing something other than "*" as the parameter here, + unless you use the "--duplicates" or "--crossdups" options. + The "*" represents the list of data directory children to examine. + Since break_filelist skips duplicate files identified + in a particular run, if you run break_filelist + on only certain children, some duplicate files won't be detected. + If you're allowing duplicates (via "--duplicates" or + "--crossdups"), then this isn't a problem. + Or, you can use the ``--duplistfile'' option to store and retrieve + hashes of files, so that additional files can be handled. +

    + If there are so many directories that the command line is too long, + you can run break_filelist multiple times and give it + a subset of the directories each time. + You'll need to use one of the duplicate control options to do this. + I would suggest using "--crossdups", which + means that duplicates inside a child will only be counted once, + eliminating at least some of the problems of duplicates. + Here's the equivalent of "break_filelist *" when there are a large + number of subdirectories: +

    + find . -maxdepth 1 -mindepth 1 -type d -exec break_filelist --crossdups {} \;
    +
    + Indeed, for all of the later commands where "*" is listed as the parameter + in these instructions + (for the list of data directory children), just run the above "find" + command and replace "break_filelist --crossdups" with the command shown. +
  7. (Optional) +If you're not very familiar with the program you're analyzing, you +might not be sure that "break_filelist" has correctly identified +all of the files. +In particular, the system might be using an unexpected +programming language or extension not handled by SLOCCount. +If this is your circumstance, you can just run the command: +
    + count_unknown_ext
    +
    +(note that this command is unusual - it doesn't take any arguments, +since it's hard to imagine a case where you wouldn't want every +directory examined). +Unlike the other commands discussed, this one specifically looks at +${HOME}/.slocdata. +This command presents a list of extensions which are unknown to break_filelist, +with the most common ones listed first. +The output format is a name, followed by the number of instances; +the name begins with a "." if it's an extension, or, if there's no +extension, it begins with "/" followed by the base name of the file. +break_filelist already knows about common extensions such as ".gif" and ".png", +as well as common filenames like "README". +You can also view the contents of each of the data directory children's +files to see if break_filelist has correctly categorized the files. +
  8. Now compute SLOC and filecounts for each language; you can compute for all + languages at once by calling: +
    +   compute_all *
    +
    +If you only want to compute SLOC for a specific language, +you can invoke compute_sloc_lang, which takes as its first parameter +the SLOCCount name of the language ("ansic" for C, "cpp" for C++, +"ada" for Ada, "asm" for assembly), followed by the list +of data directory children. +Note that these names are a change from version 1.0, which +called the master program "compute_all", +and had "compute_*" programs for each language. +

    +Notice the "*"; you can replace the "*" with just the list of +data directory children (subdirectories) to compute, if you wish. +Indeed, you'll notice that nearly all of the following commands take a +list of data directory children as arguments; when you want all of them, use +"*" (as shown in these instructions), otherwise, list the ones you want. +

    +When you run compute_all or compute_sloc_lang, each data directory +child (subdirectory) +is consulted in turn for a list of the relevant files, and the +SLOC results are placed in that data directory child. +In each child, +the file "LANGUAGE-outfile.dat" lists the information from the +basic SLOC counters. +That is, the oufile lists the SLOC and filename +(the assembly outfile has additional information), and ends with +a line saying "Total:" followed by a line showing the total SLOC of +that language in that data directory child. +The file "all-physical.sloc" has the final total SLOC for every language +in that child directory (i.e., it's the last line of the outfile). +

  9. (Optional) If you want, you can also use USC's CodeCount. +I've had trouble with these programs, so I don't do this normally. +However, you're welcome to try - they support logical SLOC measures +as well as physical ones (though not for most of the languages +supported by SLOCCount). +Sadly, they don't seem to compile in gcc without a lot of help, they +used fixed-width buffers that make me nervous, and I found a +number of bugs (e.g., it couldn't handle "/* text1 *//* text2 */" in +C code, a format that's legal and used often in the Linux kernel). +If you want to do this, +modify the files compute_c_usc and compute_java_usc so they point to the +right directories, and type: +
    + compute_c_usc *
    +
    +
  10. Now you can analyze the results. The main tool for +presenting SLOCCount results is "get_sloc", e.g,: +
    +  get_sloc * | less
    +
    +The get_sloc program takes many options, including: +
    + --filecount    Display number of files instead of SLOC (SLOC is the default)
    + --wide         Use "wide" format instead (tab-separated columns)
    + --nobreak      Don't insert breaks in long lines
    + --sort  X      Sort by "X", where "X" is the name of a language
    +                ("ansic", "cpp", "fortran", etc.), or "total".
    +                By default, get_sloc sorts by "total".
    + --nosort       Don't sort - just present results in order of directory
    +                listing given.
    + --showother    Show non-language totals (e.g., # duplicate files).
    + --oneprogram   When computing effort, assume that all files are part of
    +                a single program.  By default, each subdirectory specified
    +                is assumed to be a separate, independently-developed program.
    + --noheader     Don't show the header
    + --nofooter     Don't show the footer (the per-language values and totals)
    +
    +

    +Note that unlike the "sloccount" tool, get_sloc requires the current +directory to be the data directory. +

    +If you're displaying SLOC, get_sloc will also estimate the time it +would take to develop the software using COCOMO (using its "basic" model). +By default, this figure assumes that each of the major subdirectories was +developed independently of the others; +you can use "--oneprogram" to make the assumption that all files are +part of the same program. +The COCOMO model makes many other assumptions; see the paper at +http://www.dwheeler.com/sloc +for more information. +

    +If you need to do more analysis, you might want to use the "--wide" +option and send the data to another tool such as a spreadsheet +(e.g., gnumeric) or RDBMS (e.g., PostgreSQL). +Using the "--wide" option creates tab-separated data, which is easier to +import. +You may also want to use the "--noheader" and/or "--nofooter" options to +simplify porting the data to another tool. +

    +Note that in version 1.0, "get_sloc" was called "get_data". +

    +If you have so many data directory children that you can't use "*" +on the command line, get_sloc won't be as helpful. +Feel free to patch get_sloc to add this capability (as another option), +or use get_sloc_detail (discussed next) to feed the data into another tool. +

  11. (Optional) If you just can't get the information you need from get_sloc, +then you can get the raw results of everything and process the data +yourself. +I have a little tool to do this, called get_sloc_details. +You invoke it in a similar manner: +
    +get_sloc_details *
    +
    +
+ +

+

Designer's Notes

+

+Here are some ``designer's notes'' on how SLOCCount works, +including what it can handle. +

+The program break_filelist +has categories for each programming language it knows about, +plus the special categories ``not'' (not a source code file), +``auto'' (an automatically-generated file and thus not to be counted), +``zero'' (a zero-length file), +``dup'' (a duplicate of another file as determined by an md5 checksum), +and +``unknown'' (a file which doesn't seem to be a source code file +nor any of these other categories). +It's a good idea to examine +the ``unknown'' items later, checking the common extensions +to ensure you have not missed any common types of code. +

+The program break_filelist uses lots of heuristics to correctly +categorize files. +Here are few notes about its heuristics: +

    +
  1. +break_filelist first checks for well-known extensions (such as .gif) that +cannot be program files, and for a number of common generated filenames. +
  2. +It then peeks at the first few lines for "#!" followed by a legal script +name. +Sometimes it looks further, for example, many Python programs +invoke "env" and then use it to invoke python. +
  3. +If that doesn't work, it uses the extension to try to determine the category. +For a number of languages, the extension is not reliable, so for those +languages it examines the file contents and uses a set of heuristics +to determine if the file actually belongs to that category. +
  4. +Detecting automatically generated files is not easy, and it's +quite conceivable that it won't detect some automatically generated files. +The first 15 lines are examined, to determine if any of them +include at the beginning of the line (after spaces and +possible comment markers) one of the following phrases (ignoring +upper and lower case distinctions): +``generated automatically'', +``automatically generated'', +``this is a generated file'', +``generated with the (something) utility'', +or ``do not edit''. +
  5. A number of filename conventions are used, too. +For example, +any ``configure'' file is presumed to be automatically generated if +there's a ``configure.in'' file in the same directory. +
  6. +To eliminate duplicates, +the program keeps md5 checksums of each program file. +Any given md5 checksum is only counted once. +Build directories are processed alphabetically, so +if the same file content is in both directories ``a'' and ``b'', +it will be counted only once as being part of ``a'' unless you make +other arrangements. +Thus, some data directory children with names later in the alphabet may appear +smaller than would make sense at first glance. +It is very difficult to eliminate ``almost identical'' files +(e.g., an older and newer version of the same code, included in two +separate packages), because +it is difficult to determine when two ``similar'' files are essentially +the same file. +Changes such as the use of pretty-printers and massive renaming of variables +could make small changes seem large, while the small files +might easily appear to be the ``same''. +Thus, files with different contents are simply considered different. +
  7. +If all else fails, the file is placed in the ``unknown'' category for +later analysis. +
+

+One complicating factor is that I wished to separate C, C++, and +Objective-C code, but a header file ending with +``.h'' or ``.hpp'' file could be any of these languages. +In theory, ``.hpp'' is only C++, but I found that in practice this isn't true. +I developed a number of heuristics to determine, for each file, +what language a given header belonged to. +For example, if a given directory has exactly one of these languages +(ignoring header files), +the header is assumed to belong to that category as well. +Similarly, if there is a body file (e.g., ".c") that has the same name +as the header file, then presumably the header file is of the same language. +Finally, a header file with the keyword ``class'' is almost certainly not a +C header file, but a C++ header file; otherwise it's assumed to +be a C file. +

+None of the SLOC counters fully parse the source code; they just examine +the code using simple text processing patterns to count the SLOC. +In practice, by handling a number of special cases this seems to be fine. +Here are some notes on some of the language counters; +the language name is followed by common extensions in parentheses +and the SLOCCount name of the language in brackets: +

    +
  1. Ada (.ada, .ads, .adb) [ada]: Comments begin with "--". +
  2. Assembly (.s, .S, .asm) [asm]: +Assembly languages vary greatly in the comment character they use, +so my counter had to handle this variance. +The assembly language counter (asm_count) +first examines the file to determine if +C-style ``/*'' comments and C preprocessor commands +(e.g., ``#include'') are used. +If both ``/*'' and ``*/'' are in the file, it's assumed that +C-style comments are being used +(since it is unlikely that both would be used +as something else, say as string data, in the same assembly language file). +Determining if a file used the C preprocessor was trickier, since +many assembly files do use ``#'' as a comment character and some +preprocessor directives are ordinary words that might be included +in a human comment. +The heuristic used is as follows: if #ifdef, #endif, or #include are used, the +C preprocessor is used; or if at least three lines have either #define or #else, +then the C preprocessor is used. +No doubt other heuristics are possible, but this at least seems to produce +reasonable results. +The program then determines what the comment character is by identifying +which punctuation mark (from a set of possible marks) +is the most common non-space initial character on a line +(ignoring ``/'' and ``#'' if C comments or preprocessor commands, +respectively, are used). +Once the comment character has been determined, and it's been determined +if C-style comments are allowed, the lines of code +are counted in the file. +
  3. awk (.awk) [awk]: Comments begin with "#". +
  4. C (.c) [ansic]: Both traditional C comments (/* .. */) and C++ +(//) comments are supported. +Technically, C doesn't support "//", but in practice many C programs use them. +The C counter understands multi-line strings, so +comment characters (/* .. */ and //) are treated as data inside strings. +Conversely, the counter knows that any double-quote characters inside a +comment does not begin a C/C++ string. +
  5. C++ (.C, .cpp, .cxx, .cc) [cpp]: The same counter is used for +both C and C++. +Note that break_filelist does try to separate C from C++ for purposes +of accounting between them. +
  6. C# (.cs): The same counter is used as for C and C++. +Note that there are no "header" filetypes in C#. +
  7. C shell (.csh) [csh]: Comments begin with "#". +
  8. COBOL (.cob, .cbl) [cobol]: SLOCCount +detects if a "freeform" command has been given; until such a command is +given, fixed format is assumed. +In fixed format, comments have a "*" or "/" in column 7 or column 1; +any line that's not a comment, and has a nonwhitespace character after column 7 +(the indicator area) is counted as a source line of code. +In a freeform style, any line beginning with optional whitespace and +then "*" or "/" is considered a comment; any noncomment line +with a nonwhitespace characeter is counted as SLOC. +
  9. Expect (.exp) [exp]: Comments begin with "#". +
  10. Fortran (.f) [fortran]: Comment-only lines are lines +where column 1 character = C, c, *, or !. +Note that this is really only a Fortran-77 SLOC counter. +<.li>Haskell (.hs) [haskell]: +This counter handles block comments {- .. -} and single line comments (--); +pragmas {-# .. -} are counted as SLOC. +This is a simplistic counter, +and can be fooled by certain unlikely combinations of block comments +and other syntax (line-ending comments or strings). +In particular, "Hello {-" will be incorrectly interpreted as a +comment block begin, and "{- -- -}" will be incorrectly interpreted as a +comment block begin without an end. +Note that .lhs (literate Haskell) is not supported; please +preprocess .lhs files into .hs files before counting. +See the +Haskell 98 +report section on literate Haskell for more information. +
  11. Java (.java) [java]: Java is counted using the same counter as C and C++. +
  12. lex (.l) [lex]: Uses traditional C /* .. */ comments. +Note that this does not use the counter as C/C++ internally, since +it's quite legal in lex to have "//" (where it is NOT a comment). +
  13. LISP (.el, .scm, .lsp, .jl) [lisp]: Comments begin with ";". +
  14. ML (.ml, .mli) [ml]: Comments are enclosed in (* .. *). +
  15. Modula3 (.m3, .i3) [modula3]: Comments are enclosed in (* .. *). +
  16. Objective-C (.m) [objc]: Comments are old C-style /* .. */ comments. +
  17. Pascal (.p, .pas) [pascal]: Comments are enclosed in curly braces {} +or (*..*). This counter has known weaknesses; see the BUGS section of +the manual page for more information. +
  18. Perl (.pl, .pm, .perl) [perl]: +Comments begin with "#". +Perl permits in-line ``perlpod'' documents, ``here'' documents, and an +__END__ marker that complicate code-counting. +Perlpod documents are essentially comments, but a ``here'' document +may include text to generate them (in which case the perlpod document +is data and should be counted). +The __END__ marker indicates the end of the file from Perl's +viewpoint, even if there's more text afterwards. +
  19. PHP (.php, .php[3456], .inc) [php]: +Code is counted as PHP code if it has a .php file extension; +it's also counted if it has an .inc extension and looks like PHP code. +SLOCCount does not count PHP code embedded in HTML files normally, +though its lower-level routines can do so if you want to +(use php_count to do this). +Any of the various ways to begin PHP code can be used +(<? .. ?>, +<?php .. ?>, +<script language="php"> .. </script>, +or even <% .. %>). +Any of the PHP comment formats (C, C++, and shell) can be used, and +any string constant formats ("here document", double quote, and single +quote) can be used as well. +
  20. Python (.py) [python]: +Comments begin with "#". +Python has a convention that, at the beginning of a definition +(e.g., of a function, method, or class), an unassigned string can be +placed to describe what's being defined. Since this is essentially +a comment (though it doesn't syntactically look like one), the counter +avoids counting such strings, which may have multiple lines. +To handle this, +strings which started the beginning of a line were not counted. +Python also has the ``triple quote'' operator, permitting multiline +strings; these needed to be handled specially. +Triple quote stirngs are normally considered as data, regardless of +content, unless they were used as a comment about a definition. +
  21. Ruby (.rb) [ruby]: Comments begin with "#". +
  22. sed (.sed) [sed]: Comments begin with "#". +Note that these are "sed-only" files; many uses of sed are embeded in +shell scripts (and are categorized as shell scripts in those cases). +
  23. shell (.sh) [sh]: Comments begin with "#". +Note that I classify ksh, bash, and the original Bourne shell sh together, +because they have very similar syntaxes. +For example, in all of these shells, +setting a variable is expressed as "varname=value", +while C shells use the use "set varname=value". +
  24. TCL (.tcl, .tk, .itk) [tcl]: Comments begin with "#". +
  25. Yacc (.y) [yacc]: Yacc is counted using the same counter as C and C++. +
+

+Much of the code is written in Perl, since it's primarily a text processing +problem and Perl is good at that. +Many short scripts are Bourne shell scripts (it's good at +short scripts for calling other programs), and the +basic C/C++ SLOC counter is written in C for speed. +

+I originally named it "SLOC-Count", but I found that some web search +engines (notably Google) treated that as two words. +By naming it "SLOCCount", it's easier to find by those who know +the name of the program. +

+SLOCCount only counts physical SLOC, not logical SLOC. +Logical SLOC counting requires much more code to implement, +and I needed to cover a large number of programming languages. + + +

+

Definition of SLOC

+

+This tool measures ``physical SLOC.'' +Physical SLOC is defined as follows: +``a physical source line of code (SLOC) is a line ending +in a newline or end-of-file marker, +and which contains at least one non-whitespace non-comment character.'' +Comment delimiters (characters other than newlines starting and ending +a comment) are considered comment characters. +Data lines only including whitespace +(e.g., lines with only tabs and spaces in multiline strings) are not included. +

+To make this concrete, here's an example of a simple C program +(it strips ANSI C comments out). +On the left side is the running SLOC total, where "-" indicates a line +that is not considered a physical "source line of code": +

+ 1    #include <stdio.h>
+ -    
+ -    /* peek at the next character in stdin, but don't get it */
+ 2    int peek() {
+ 3     int c = getchar();
+ 4     ungetc(c, stdin);
+ 5     return c;
+ 6    }
+ -    
+ 7    main() {
+ 8     int c;
+ 9     int incomment = 0;  /* 1 = we are inside a comment */
+ -    
+10     while ( (c = getchar()) != EOF) {
+11        if (!incomment) {
+12          if ((c == '/') && (peek() == '*')) {incomment=1;}
+13        } else {
+14          if ((c == '*') && (peek() == '/')) {
+15               c= getchar(); c=getchar(); incomment=0;
+16          }
+17        }
+18        if ((c != EOF) && !incomment) {putchar(c);}
+19     }
+20    }
+
+

+Robert E. Park et al.'s +Software Size Measurement: +A Framework for Counting Source Statements +(Technical Report CMU/SEI-92-TR-20) +presents a set of issues to be decided when trying to count code. +The paper's abstract states: +

+This report presents guidelines for defining, recording, and reporting +two frequently used measures of software sizeŃ physical source lines +and logical source statements. +We propose a general framework for constructing size +definitions and use it to derive operational methods for +reducing misunderstandings in measurement results. +
+

+Using Park's framework, here is how physical lines of code are counted: +

    +
  1. Statement Type: I used a physical line-of-code as my basis. +I included executable statements, declarations +(e.g., data structure definitions), and compiler directives +(e.g., preprocessor commands such as #define). +I excluded all comments and blank lines. +
  2. How Produced: +I included all programmed code, including any files that had been modified. +I excluded code generated with source code generators, converted with +automatic translators, and those copied or reused without change. +If a file was in the source package, I included it; if the file had +been removed from a source package (including via a patch), I did +not include it. +
  3. Origin: You select the files (and thus their origin). +
  4. Usage: You selects the files (and thus their usage), e.g., +you decide if you're going to +include additional applications able to run on the system but not +included with the system. +
  5. Delivery: You'll decide what code to include, but of course, +if you don't have the code you can't count it. +
  6. Functionality: This tool will include both operative and inoperative code +if they're mixed together. +An example of intentionally ``inoperative'' code is +code turned off by #ifdef commands; since it could be +turned on for special purposes, it made sense to count it. +An example of unintentionally ``inoperative'' code is dead or unused code. +
  7. Replications: +Normally, duplicate files are ignored, unless you use +the "--duplicates" or "--crossdups" option. +The tool will count +``physical replicates of master statements stored in +the master code''. +This is simply code cut and pasted from one place to another to reuse code; +it's hard to tell where this happens, and since it has to be maintained +separately, it's fair to include this in the measure. +I excluded copies inserted, instantiated, or expanded when compiling +or linking, and I excluded postproduction replicates +(e.g., reparameterized systems). +
  8. Development Status: You'll decide what code +should be included (and thus the development status of the code that +you'll accept). +
  9. Languages: You can see the language list above. +
  10. Clarifications: I included all statement types. +This included nulls, continues, no-ops, lone semicolons, +statements that instantiate generics, +lone curly braces ({ and }), and labels by themselves. +
+

+Thus, SLOCCount generally follows Park's ``basic definition'', +but with the following exceptions depending on how you use it: +

    +
  1. How Produced: +By default, this tool excludes duplicate files and +code generated with source code generators. +After all, the COCOMO model states that the +only code that should be counted is code +``produced by project personnel'', whereas these kinds of files are +instead the output of ``preprocessors and compilers.'' +If code is always maintained as the input to a code generator, and then +the code generator is re-run, it's only the code generator input's size that +validly measures the size of what is maintained. +Note that while I attempted to exclude generated code, this exclusion +is based on heuristics which may have missed some cases. +If you want to count duplicates, use the +"--autogen", "--duplicates", and/or "--crossdups" options. +If you want to count automatically generated files, pass +the "--autogen" option mentioned above. +
  2. Origin: +You can choose what source code you'll measure. +Normally physical SLOC doesn't include an unmodified +``vendor-supplied language support library'' nor a +``vendor-supplied system or utility''. +However, if this is what you are measuring, then you need to include it. +If you include such code, your set will be different +than the usual ``basic definition.'' +
  3. Functionality: I included counts of unintentionally inoperative code +(e.g., dead or unused code). +It is very difficult to automatically detect such code +in general for many languages. +For example, a program not directly invoked by anything else nor +installed by the installer is much more likely to be a test program, +which you may want to include in the count (you often would include it +if you're estimating effort). +Clearly, discerning human ``intent'' is hard to automate. +
+

+Otherwise, this counter follows Park's +``basic definition'' of a physical line of code, even down to Park's +language-specific definitions where Park defined them for a language. + + +

+

Miscellaneous Notes

+

+There are other undocumented analysis tools in the original tar file. +Most of them are specialized scripts for my circumstances, but feel +free to use them as you wish. +

+If you're packaging this program, don't just copy every executable +into the system "bin" directory - many of the files are those +specialized scripts. +Just put in the bin directory every executable documented here, plus the +the files they depend on (there aren't that many). +See the RPM specification file to see what's actually installed. +

+You have to take any measure of SLOC (including this one) with a +large grain of salt. +Physical SLOC is sensitive to the format of source code. +There's a correlation between SLOC and development effort, and some +correlation between SLOC and functionality, +but there's absolutely no correlation between SLOC +and either "quality" or "value". +

+A problem of physical SLOC is that it's sensitive to formatting, +and that's a legitimate (and known) problem with the measure. +However, to be fair, logical SLOC is influenced by coding style too. +For example, the following two phrases are semantically identical, +but will have different logical SLOC values: +

+   int i, j;  /* 1 logical SLOC */
+
+   int i;     /* 2 logical SLOC, but it does the same thing */
+   int j;
+
+

+If you discover other information that can be divided up by +data directory children (e.g., the license used), it's probably best +to add that to each subdirectory (e.g., as a "license" file in the +subdirectory). +Then you can modify tools like get_sloc +to add them to their display. +

+I developed SLOCCount for my own use, not originally as +a community tool, so it's certainly not beautiful code. +However, I think it's serviceable - I hope you find it useful. +Please send me patches for any improvements you make! +

+You can't use this tool as-is with some estimation models, such as COCOMO II, +because this tool doesn't compute logical SLOC. +I certainly would accept code contributions to add the ability to +measure logical SLOC (or related measures such as +Cyclomatic Complexity and Cyclomatic density); +selecting them could be a compile-time option. +However, measuring logical SLOC takes more development effort, so I +haven't done so; see USC's "CodeCount" for a set of code that +measures logical SLOC for some languages +(though I've had trouble with CodeCount - in particular, its C counter +doesn't correctly handle large programs like the Linux kernel). + + +

+

SLOCCount License

+

+Here is the SLOCCount License; the file COPYING contains the standard +GPL version 2 license: +

+=====================================================================
+SLOCCount
+Copyright (C) 2000-2001 David A. Wheeler (dwheeler, at, dwheeler.com)
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+=====================================================================
+
+

+While it's not formally required by the license, please give credit +to me and this software in any report that uses results generated by it. +

+This document was written by David A. Wheeler (dwheeler, at, dwheeler.com), +and is +(C) Copyright 2001 David A. Wheeler. +This document is covered by the license (GPL) listed above. +

+The license does give you the right to +use SLOCCount to analyze proprietary programs. + +

+

Related Tools

+

+One available toolset is +CodeCount. +I tried using this toolset, but I eventually gave up. +It had too many problems handling the code I was trying to analyze, and it +does a poor job automatically categorizing code. +It also has no support for many of today's languages (such as Python, +Perl, Ruby, PHP, and so on). +However, it does a lot of analysis and measurements that SLOCCount +doesn't do, so it all depends on your need. +Its license appeared to be open source, but it's quite unusual and +I'm not enough of a lawyer to be able to confirm that. +

+Another tool that's available is LOCC. +It's available under the GPL. +It can count Java code, and there's experimental support for C++. +LOCC is really intended for more deeply analyzing each Java file; +what's particularly interesting about it is that it can measure +"diffs" (how much has changed). +See + +A comparative review of LOCC and CodeCount. +

+ +CCCC is a tool which analyzes C++ and Java files +and generates a report on various metrics of the code. +Metrics supported include lines of code, McCabe's complexity, +and metrics proposed by Chidamber & Kemerer and Henry & Kafura. +(You can see +Time Littlefair's comments). +CCCC is in the public domain. +It reports on metrics that sloccount doesn't, but sloccount can handle +far more computer languages. + +

+

Submitting Changes

+

+The GPL license doesn't require you to submit changes you make back to +its maintainer (currently me), +but it's highly recommended and wise to do so. +Because others will send changes to me, a version you make on your +own will slowly because obsolete and incompatible. +Rather than allowing this to happen, it's better to send changes in to me +so that the latest version of SLOCCount also has the +features you're looking for. +If you're submitting support for new languages, be sure that your +chnage correctly ignores files that aren't in that new language +(some filename extensions have multiple meanings). +You might want to look at the TODO file first. +

+When you send changes to me, send them as "diff" results so that I can +use the "patch" program to install them. +If you can, please send ``unified diffs'' -- GNU's diff can create these +using the "-u" option. + + -- cgit v1.2.1