2004-08-01 David A. Wheeler * Released version 2.26. * Modified driver.h to clearly state the GPL license. This doesn't change anything, but it makes the Savannah people happy. 2004-07-31 David A. Wheeler * Released version 2.25. Changes are: * Per request from Savannah, added the more detailed licensing text to every source file. * Modified the assembly language counting code, based on useful feedback and a test case from Purnendu Ghosh, so that the heuristics work better at guessing the right comment character and they perform well. In particular, the comment character '*' is far better supported. * Added support for Delphi project files (.dpr files, which are essentially in Pascal syntax), thanks to Christian Iversen. * Some versions of Perl are apparantly causing trouble, but I have not yet found the solution for them (other than using a different version of Perl). The troublesome line of code in break_filelist, which currently says: open(FH, "-|", "md5sum", $filename) or return undef; This could be changed to: open(FH, "-|", "md5sum $filename") or return undef; But I dare not fix it that way, because that would create a security problem. Imagine downloading someone else's source code (who you don't know), using sloccount, and that other person has created in their source tree a file named like this: "; rm -fr /*" or its variations. I'd rather have the program fail in specific circumstances (users will know when it won't work!) than to insert a known dangerous security vulnerability. I can't reproduce this problem; it's my hope that those who CAN will help me find a good solution. For the moment, I'm documenting the problem here and in the TODO list, so that people will realize WHY it hasn't just been "fixed" with the "obvious solution". The answer: I care about security. 2004-05-10 David A. Wheeler * Released version 2.24 - a few minor bugfixes and improvements. Automatically tries to use several different MD5 programs, until it finds one that works - this is more flexible, and as a result, it now works out-of-the-box on Apple Mac OS X. SLOCCount now accepts "." as the directory to analyze, it correctly identifies wrapper scripts left by libtool as automatically generated code, and correctly identifies debian/rules files as makefiles. Also, installation documentation has improved. My thanks to Jesus M. Gonzalez-Barahona for telling me about the Debian bug reports and testing of candidate versions. My thanks to Koryn Grant, who told me what needed to be done to get SLOCCount running on Mac OS X (and for testing my change). This version resolves Debian Bug reports #173699, #159609, and #200348. 2004-04-27 David A. Wheeler * Automatically try several different MD5 programs, looking for a working one. Originally this program REQUIRED md5sum. This new version tried md5sum, then md5, then openssl. The good news - the program should now 'just work' on Apple Mac OS X. The bad news - if md5sum doesn't exist, sloccount still has a good chance of working, but will display odd error messages while it searches for a working MD5 program. There doesn't seem to be an easy way in perl to suppress such messages while still permitting "trouble reading data file" messages. However, doing the test at run-time is much more robust, and this way it at least has a chance of working on systems it didn't work on at all before. * Removed the "debian" subdirectory. There was no need for it; it's best for the Debian package maintainers to control that information on their own. 2004-04-25 David A. Wheeler * Allow "." and ".." as specifications for directories even when they have no subdirectories. This resolves Debian bug report log #200348 ("Sloccount . fails"). * Correctly identify wrapper scripts left by libtool as automatically generated code. When linking against a libtool library, libtool leaves a wrapper script in the source tree (so that the binary can be executed in-place, without installing it), which includes this: (line) # foo - temporary wrapper script for .libs/foo (line) # Generated by ltmain.sh - GNU libtool 1.4.3 (1.922.2.111 2002/10/23 02:54:36) I fixed this by saying that any comment beginning with "Generated by" in the first few lines must be auto-generated code. This should correctly catch other auto-generated code too. There is a risk that code NOT automatically generated will be incorrectly labelled, but that's unlikely. This resolves Debian Bug report logs #173699, "sloccount should ignore libtool-generated wrapper scripts". * Now identifies "debian/rules" files as a makefile. This resolves Debian Bug report logs - #159609, "sloccount Does not consider debian/rules to be a makefile". * Minor fix to sloccount makefile, so that man page installs correctly in some situations that didn't before. My thanks to Jesus M. Gonzalez-Barahona. 2003-11-01 David A. Wheeler * Version 2.23 - a few minor bugfixes and improvements. 2003-11-01 David A. Wheeler * Fixed incorrect UTF-8 warnings. Perl 5.8.0 creates warnings when the LANG value includes ".UTF-8" but the text files read aren't UTF-8. This causes problems on Red Hat Linux 9 and others, which set LANG to include ".UTF-8" by default. This version quietly removes ".UTF-8" from the LANG value for purposes of sloccount, to eliminate the problem. 2003-11-01 David A. Wheeler * Fixed bad link to "options" in sloccount.html; my thanks to Barak Zalstein ( * Fixed a bug in perl_count that prevents it from properly skipping POD. 2003-10-30 Julian Squires * Added simple literate Haskell support. * Added test cases for literate Haskell support. * Updated Common LISP and Modula 3 extensions. 2003-03-08 David A. Wheeler * Version 2.22 - improved OCAML support, thanks to Michal Moskal. Other minor improvements. 2003-02-15 Jay A. St. Pierre * Fixed uninstalling documents to always remove DOC_DIR. 2003-02-15 Michal Moskal * Significantly improved OCAML support - complete rewrite of ML handling. 2003-01-28 David A. Wheeler * Version 2.21 - improved Fortran support (inc. Fortran 90); my thanks to Erik Schnetter for implementing this! 2002-12-17 Erik Schnetter * Added support for Fortran 90. Extensions are ".f90" and ".F90". * Changed handling of Fortran 77 to include HPF and Open MP statements, and to accept uppercase ".F77" as extension. 2002-12-04 David A. Wheeler * Version 2.20 - minor portability and documentation improvements. * Documentation improvements - more discussion on Intermediate COCOMO. 2002-12-04 Linh Luong * Modified SLOCCount so that it would run on Solaris 2.7 (once Perl is installed and the PATH is set correctly to include the directory where SLOCCount is installed). This required modifying file sloccount to eliminate the test ("[") option "-e", replacing it with the "-r" option ("test -e" is apparantly not supported by Solaris 2.7). Since "-r" should be available on any implementation of "test", this is a nice portable change. 2002-11-16 David A. Wheeler * Version 2.19, documentation improvement. * Documented the "Improved COCOMO" model from Boehm, so that users who want more accurate estimates can do at least a little bit straight from the documentation. For more, as always, see Boehm's book. If anyone wants to implement logical SLOC counting, please be my guest! Then, COCOMO II could be implemented too. * Modified this ChangeLog to document more fully the SGI MIPS problem. 2002-11-16 David A. Wheeler * Version 2.18, minor bugfix release. * Updated the "wc -l" check; it would cause problems for users who had never used sloccount before (because datadir had not been created yet). Also, the "wc -l" check itself would not reliably identify SGI systems that had horribly buggy "wc" programs; it's believed this is a better check. Thanks to Randal P. Andress for helping with this. * Fixed this ChangeLog. It was Randal P. Andress who identified the "wc -l" bug, not Bob Brown. Sorry for the misattribution, and thanks for the bugfixing help! * Changed rpm building command to work with rpm version 4 (as shipped with Red Hat Linux 8.0). As of Red Hat Linux 8, the "rpm" command only loads files, while there is now a separate "rpmbuild" command for creating rpm files. Those rebuilding with Red Hat Linux 7.X or less (rpm < version 4) will need to edit the makefile slightly, as documented in the makefile, to modify the variable RPMBUILD. * "make rpm" now automatically uninstalls sloccount first if it can, to eliminate unnecessary errors when building new versions of sloccount RPMs. This only affects people modifying and redistributing the code of sloccount (mainly, me). 2002-11-16 Randal P. Andress * Fixed get_sloc so that it also accepts --filecounts as well as --filecount. 2002-11-05 David A. Wheeler * Released version 2.17, which adds support for Java Server Pages (.jsp), eliminates some warnings in newer Perl implementations, and has a few minor fixes/improvments. 2002-11-18 Randal P. Andress * Randal provided the following additional information about this really nasty problem on SGI MIPS machines. It causes gcc to not work properly, and thus "wc" won't work properly either. SLOCCount now detects that there's a problem and will refuse to run if things are screwed up this badly. For those unfortunate few who have to deal with this case, here's additional information from Randal Andress: When gcc is installed on SGI MIPS from source, sgi-mips-sgi-irix6.x, an option specification in the 'specs' file is set incorrectly for n32. The offending line is: %{!mno-long64:-D__LONG_MAX__=9223372036854775807LL} Which (unless option '-mno-long64' is specified), means that LONG_MAX is 64 bits. The trouble is two fold: 1. This should not be the default, since for n32, normally, long is only 32 bits. and 2. The option did not carry into the compiler past the pre-processor - so it did not work. The simplest fix for gcc (it seems that it can be done locally by editing the specs file) is to have the following line to replace the offending line in the specs file: %{long64:-D__LONG_MAX__=9223372036854775807LL} This makes the default 32 and only sets it to 64 if you specify '-long64' which *does* work all the way through the compiler. I had the binary for gcc 3 on the sgi freeware site installed here and looked at it's specs file and found no problem (they have the '-long64' option). So it seems that when they build gcc for their freeware distribution, they fix it. The problem comes when someone downloads and builds gcc for themselves on sgi. Then the installation is faulty and any n32 code that they build is subject to this flaw if the source makes use of LONG_MAX or any of the values derived from it. The real problem turned out to be quite general for sgi n32 gcc. The 'specs' file and mips.h are not consistent resulting in 'LONG_MAX' being given an incorrect value. The following 'c' program shows inconsistent values for macros for mips-irix n32: __LONG_MAX__ (LONG_MAX) and _MIPS_SZLONG This seems to stem from an improper default option in the specs file forcing -D__LONG_MAX__=0x7fffffffffffffff to be passed to each compile. Here is the test case, compile command, and output: # include #define LONG_MAX_32_BITS 2147483647 #include int main () { #if LONG_MAX <= LONG_MAX_32_BITS printf ("LONG_MAX <= LONG_MAX_32_BITS = 0x%lx\n",LONG_MAX); #else printf ("LONG_MAX > LONG_MAX_32_BITS = 0x%llx\n",LONG_MAX); #endif printf ("_MIPS_SZLONG = 0x%x\n",_MIPS_SZLONG); printf ("__LONG_MAX__ = 0x%llx (size:%d)\n",__LONG_MAX__, sizeof (__LONG_MAX__)); #if LONG_MAX <= LONG_MAX_32_BITS printf ("LONG_MAX = 0x%lx (size:%d) \n",LONG_MAX,sizeof(LONG_MAX)); #else printf ("LONG_MAX = 0x%llx (size:%d) \n",LONG_MAX,sizeof(LONG_MAX)); #endif printf ("LONG_MAX_32_BITS = 0x%x (size:%d) \n",LONG_MAX_32_BITS,sizeof(LONG_MAX_32_BITS)); return 0; } ============ end test case source. >gcc -n32 -v -o test_limits -O0 -v -g test_limits.c defines include:....-D__LONG_MAX__=9223372036854775807LL.... =========== test output: >test_limits LONG_MAX > LONG_MAX_32_BITS = 0x7fffffffffffffff _MIPS_SZLONG = 0x20 __LONG_MAX__ = 0x7fffffffffffffff (size:8) LONG_MAX = 0x7fffffffffffffff (size:8) LONG_MAX_32_BITS = 0x7fffffff (size:4) ======== end test case output By changing the specs entry: %{!mno-long64:-D__LONG_MAX__=9223372036854775807LL} to %{long64:-D__LONG_MAX__=9223372036854775807LL} as is discussed in one of the internet reports I sent earlier, the output, after recompiling and running is: LONG_MAX <= LONG_MAX_32_BITS = 0x7fffffff _MIPS_SZLONG = 0x20 __LONG_MAX__ = 0x7fffffff (size:4) LONG_MAX = 0x7fffffff (size:4) LONG_MAX_32_BITS = 0x7fffffff (size:4) Although I have not studied it well enough to know exactly why, the problem has to do with the size of (long int) and the attempt of the 'memchr' code to determine whether or not it can use 64 bit words rather than 32 bit words in chunking through the string looking for the specified character, "\n"(0x0a) in the case of 'wc'. 2002-11-03 David A. Wheeler * Fixed makefile install/uninstall scripts to properly handle documentation. * Added simple check at beginning of sloccount's execution to make sure "wc -l" actually works. Randal P. Andress has found that on certain SGI machines, "wc -l" produces the wrong answers. He reports, "You may already know this, but just in case you do not, there is an apparent bug in textutils-1.19 function 'wc' (at least as built on SGI-n32) which is caused by an apparent bug in memchr (*s, c, n). The bug is only evident when counting 'lines only' or 'lines and characters' (i.e., when NOT counting words). The result is that the filecount is short... I replaced the memchr with very simple code and it corrected the problem. I then installed textutils-2.1 which does not seem have the problem." I thought about adding this information just to the documentation, but no one would notice it. By adding a check to the code, most people will neither know nor care about the problem, and the few people it DOES affect will know about the problem right away (instead of reporting wrong answers). Yes, a failing "wc -l" is a pretty horrific bug, but rather than ignore the problem, it's better to detect and address it. * Modified documentation everywhere so that it consistently documents "--filecount" as the correct option for filecounts, not "--filecounts". That way, the documentation is consistent. * However, in an effort to "do the right thing", the program sloccount will accept "--filecounts" as an alternative way to specify --filecount. 2002-11-02 Bob Brown * Contributed code changes to count Java Server Page (.jsp) files. The code does not pull comments out of embedded javascript. We don't consider that a serious limitation at all, since no one should be sending embedded javascript comments to client browsers anyhow. They're extremely rare. David A. Wheeler notes that you could argue that if you _DO_ include such comments, they're not really functioning as comments (since they DO have an affect on the result - they're more like print statements in an older language instead of a traditional language's comments). 2002-11-02 David A. Wheeler * Eliminated more Perl warnings by adding more defined() wrappers to while() loops in Perl code (based on Randal's suggestion). The problem is that Perl handles the last line of a file oddly if it doesn't end with a newline indicator, and it consists solely of "0". 2002-11-02 Randal P Andress * Eliminated some Perl warnings by adding defined() wrappers to while() loops in Perl code. 2002-8-24 David A. Wheeler * Released version 2.16, fixed limitations of old Pascal counter. 2002-8-24 David A. Wheeler * Re-implemented Pascal counter (in flex). This fixes some problems the old counter had - it handles nested comments with different formats, and strings as well. * Removed the BUGS information that described the Pascal counter weaknesses.. since now they're gone! * Added an additional detector of automatically generated files - it's an auto-generated file if it starts with "A lexical scanner generated by flex", since flex adds this. Generally, this isn't a problem, since we already detect the filename and matching .c files, but it seems worth doing. 2002-8-22 David A. Wheeler * Released version 2.15, a bugfix + small feature improvement. My sincere thanks to Jesus M. Gonzalez-Barahona, who provided patches with lots of useful improvements. 2002-8-22 Jesus M. Gonzalez-Barahona * Added support for Standard ML (as language "ml"). * A patch suggested to the Debian BTS; .hh is also a C++ extension. * Some ".inc" files are actually Pascal, not PHP; now ".inc" files are examined binned to either Pascal or PHP depending on their content. * Improved detection of Pascal files (particularly for Debian package fpc-1.0.4). * php_count was not closing open files before opening a new one, and therefore sloccount could fail to count PHP code given a VERY LONG list of PHP files in one package. * break_filelist had problems with files including and other weird characters at the end of the filename. Now fixed. 2002-7-24 David A. Wheeler * Released version 2.14. Improved Pascal detection, improved Pascal counting, added a reference to CCCC. 2002-7-24 David A. Wheeler * Modified Pascal counting; the older (*..*) commenting structure is now supported. Note that the Pascal counter is still imperfect; it doesn't handle the prioritization between these two commenting systems, and can be fooled by strings that include a comment start indicator. Rewrites welcome, however, for most people the current code is sufficient. This really needs to be rewritten in flex; languages with strings and multiline comment structures aren't handled correctly with naive Perl code. * Documented the weaknesses in the Pascal counter as BUGS. 2002-7-24 Ian West IWest, at, aethersystems, dot com * Improved heuristic for detecting Pascal programs in break_filelist. Sloccount will now categorize files as Pascal if they have the file type ".pas" as well as ".p", though it still checks the contents to make sure it's really pascal. The heuristic was modified so that it's also considered Pascal if it contains "module" and "end.", or "program", "begin", and "end." in addition to the existing cases. (Ian West used sloccount to analyze a system containing about 1.2 million lines of code in almost 10,000 files; ninety percent of it is Ada, and the bulk of the remainder is split between Pascal and SQL. The following is Ian's more detailed explanation for the change): VAX Pascal uses "module" instead of "program" for files that have no program block and therefore no "begin". There is also no requirement for a Pascal file to have procedures or functions, which is the case for files that are equivalents of C headers. So I modified the function to allow files to be accepted that only contain either: "module" and "end."; or "program", "begin", and "end.". I considered adding checks for "const", "type", and "var" but decided they were not necessary. I have added the extra cases without changing the existing logic so as not to upset any cases for "unit". It is possible to optimize the logic somewhat, but I felt clarity was better than efficiency. I found that some of my Pascal files were getting through only because the word "unit" appeared in certain comments. So I moved the line for filtering out comments above the lines that look for the keywords. Pascal in general allows comments in the form (*...*) as well as {...}, so I added a line to remove these. After making these changes, all my files were correctly categorized. I also verified that the sample Pascal files from p2c still had the same counts. Thank you for developing SLOCCount. It is a very useful tool. 2002-7-15 David A. Wheeler * Added a reference to CCCC; http://cccc.sourceforge.net/ 2002-5-31 David A. Wheeler * Released version 2.13. * Code cleanups. Turned on gcc warnings ("-Wall" option) and cleaned up all code that set off a warning. This should make the code more portable as well as cleaner. Made a minor speed optimization on an error branch. 2002-3-30 David A. Wheeler * Released version 2.12. * Added a "testcode" directory with some sample source code files for testing. It's small now, but growth is expected. Contributions for this test directory (especially for edge/oddball cases) are welcome. 2002-3-25 David A. Wheeler * Changed first-line recognizers so that the first line (#!) will matched ignoring case. For most Unix/Linux systems uppercase script statements won't work, but Windows users. * Now recognize SpeedyCGI, a persistent CGI interface for Perl. SpeedyCGI has most of the speed advantages of FastCGI, but has the security advantages of CGI and has the CGI interface (from the application writer's point of view). SpeedyCGI perl scripts have #!/usr/bin/speedy lines instead of #!/usr/bin/perl. More information about SpeedyCGI can be found at http://daemoninc.com/speedycgi/ Thanks to Priyadi Iman Nurcahyo for noticing this. 2002-3-15 David A. Wheeler * Added filter to remove calls to sudo, so "#!/usr/bin/sudo /usr/bin/python" etc as the first line are correctly identified. 2002-3-7 David A. Wheeler * Added cross-references to LOCC and CodeCount. They don't do what I want.. which is why I wrote my own! .. but others may find them useful. 2002-2-28 David A. Wheeler * Released version 2.11. * Added support for C#. Any ".cs" file is presumed to be a C# file. The C SLOC counter is used to count SLOC. Note that C# doesn't have a "header" type (Java doesn't either), so disambiguating headers isn't needed. * Added support for regular Haskell source files (.hs). Their syntax is sufficiently similar that just the regular C SLOC counter works. Note that literate Haskell files (.lhs) are _not_ supported, so be sure to process .lhs files into .hs files before counting. There are two different .lhs conventions; for more info, see: http://www.haskell.org/onlinereport/literate.html * Tweaked COBOL counter slightly. Added support in fixed (default) format for "*" and "/" as comment markers in column 1. * Modified list of file extensions known not to be source code, based on suffixes(7). This speeds things very slightly, but the main goal is to make the "unknown" list smaller. That way, it's much easier to see if many source code files were incorectly ignored. In particular, compressed formats (e.g., ".tgz") and multimedia formats (".wav") were added. * Modified documentation to make things clear: If you want source in a compressed file to be counted (e.g. .zip, .tar, .tgz), you need to uncompress the file first!! * Modified documentation to clarify that literate programming files must be expanded first. * Now recognize ".ph" as Perl (it's "Perl header" code). Please let me know if this creates many false positives (i.e., if there are programs using ".ph" in other ways). * File count_unknown_ext modified slightly so that it now examines ~/.slocdata. Modified documentation so that its use is recommended and explained. It's been there for a while, but with poor documentation I bet few understand its value. * Modified output to clearly say that it's Open Source Software / Free Software, licensed under the GPL. It was already stated that way in the documentation and code, but clearly stating this on every run makes it even harder to miss. 2002-2-27 David A. Wheeler * Released version 2.10. * COBOL support added! Now ".cbl" and ".cob" are recognized as COBOL extensions, as well as their uppercase ".CBL" and ".COB". The COBOL counter works as follows: it detects if a "freeform" command has been given. Unless a freeform command's given, a comment has "*" or "/" in column 7, and a SLOC is a non-comment line with at least one non-whitespace in column 8 or later (including columns 72 or greater; it's arguable if a line that's empty before column 72 is really a line or a comment, but I've decided to count such odd things as lines). If we've gone free-format, a comment is a line that has optional whitespace and then "*".. otherwise, a line with nonwhitespace is a SLOC. Is this good enough? I think so, but I'm not a major COBOL user. Feedback from real COBOL users would be welcome. A source for COBOL test programs is: http://www.csis.ul.ie/cobol/examples/default.htm Information on COBOL syntax gathered from various locations, inc.: http://cs.hofstra.edu/~vmaffea1/cobol.html http://support.merant.com/websupport/docs/microfocus/books/ nx31books/lrintr.htm * Modified handling of uppercase filename extensions so they'll be recognized as well as the more typicaly lowercase extensions. If a file has one or more uppercase letters - and NO lowercase letters - it's assumed that it may be a refugee from an old OS that supported only uppercase filenames. In that circumstance, if the filename extension doesn't match the set of known extensions, it's made into lowercase and recompared against the set of extensions for source code files. This heuristic should improve recognition of source file types for "old" programs using upper-case-only characters. I do have concern that this may be "too greedy" an algorithm, i.e., it might claim that some files that aren't really source code are now source code. I don't think it will be a problem, though; many people create filename extensions that only differ by case in most circumstances; the ".c" vs. ".C" thing is an exception, and since Windows folds case it's not a very portable practice. This is a pretty conservative heuristic; I found Cobol programs with lowercase filenames and uppercase extensions ("x.CBL"), which wouldn't be matched by this heuristic. For Cobol and Fortran I put in special ".F", ".CBL", and ".COB" patterns to catch them. With those two actions, the program should manage to correctly identify more source files without incorrectly matching non-source files. * ".f77" is now also accepted as a Fortran77 extension. Thanks to http://www.webopedia.com/quick_ref/fileextensionsfull.html which has lots of extension information. * Fixed a bug in handling top-level directories where there were NO source files at all; in certain cases this would create spurious error messages. (Fix in compute_all). 2002-1-7 David A. Wheeler * Released version 2.09. 2002-1-9 David A. Wheeler * Added support for the Ruby programming language, thanks to patches from Josef Spillner. * Documentation change: added more discussion about COCOMO, in particular why its cost estimates appeared so large. Some programmers think of just the coding part, and only what they'd get paid directly.. but that's less than 10% of the costs. 2002-1-7 David A. Wheeler * Minor documentation fix - the example for --effort in sloccount.html wasn't quite right (the base documentation for --effort was right, it was just the example that was wrong). My thanks to Kevin the Blue for pointing this out. 2002-1-3 David A. Wheeler * Released version 2.08. 2002-1-3 David A. Wheeler * Based on suggestions by Greg Sjaardema : * Modified c_count.c, function count_file to close the stream after the file is analyzed. Otherwise, this can cause problems with too many open files on some systems, particularly on operating systems with small limits (e.g., Solaris). * Added '.F' as a Fortran extension. 2002-1-2 David A. Wheeler * Released version 2.07. 2002-1-2 Vaclav Slavik * Modified the RPM .spec file in the following ways: * By default the RPM package now installs into /usr (so binaries go into /usr/bin). Note that those who use the makefile directly ("make install"), including tarball users, will still default to /usr/local instead. You can still make the RPM install to /usr/local by using the prefix option, e.g.: rpm -Uvh --prefix=/usr/local sloccount*.rpm * Made it use %{_prefix} variable, i.e. changing it to install in /usr/local or /usr is a matter of changing one line * Use wildcards in %files section, so that you don't have to modify the specfile when you add new executable * Mods to make it possible to build the RPM as non-root (i.e. BuildRoot support, %defattr in %files, PREFIX passed to make install) 2002-1-2 Jesus M. Gonzalez Barahona * Added support for Modula-3 (.m3, .i3). * ".sc" files are counted as Lisp. * Modified sloccount to handle EVEN LARGER systems (i.e., so sloccount will scale even more). In a few cases, parameters were passed on the command line and large systems could be so large that the command line was too long. E.G., Debian GNU/Linux. This caused a large number of changes to different files to remove these scaleability limitations. * All *_count programs now accept "-f filename" and "-f -" options, where 'filename' is a file with a list of filenames to count. Internally the "-f" option with a filename is always used, so that an arbitrarily long list of files can be measured and so that "ps" will show more status information. * compute_sloc_lang modified accordingly. * get_sloc now has a "--stdin" option. * Some small fixes here and there. * This closes Debian bug #126503. 2001-12-28 David A. Wheeler * Released sloccount 2.06. 2001-12-27 David A. Wheeler * Fixed a minor bug in break_filelist, which caused (in extremely unusual circumstances) a problem when disambiguating C from C++ files in complicated situations where this difference was hard to tell. The symptom: When analyzing some packages (for instance, afterstep-1.6.10 as packaged in Debian 2.2) you would get the following error: Use of uninitialized value in pattern match (m//) at /usr/bin/break_filelist line 962. This could only happen after many other disambiguating rules failed to determine if a file was C or C++ code, so the problem was quite rare. My thanks to Jesus M. Gonzalez-Barahona (in Mostoles, Spain) for the patch that fixes this problem. * Modified man page, explaining the problems of filenames with newlines, and also noting the problems with directories beginning with "-" (they might be confused as options). * Minor improvements to Changelog text, so that the changes over time were documented more clearly. * Note that CEPIS "Upgrade" includes a paper that depends on sloccount. This is "Counting Potatoes: the Size of Debian 2.2" which counts the size of Debian 2.2 (instead of Red Hat Linux, which is what I counted). The original release is at: . I understand that they'll make some tweaks and release a revision of the paper on the Debian website. It's interesting; Debian 2.2 (released in 2000, and which did NOT have KDE), has 56 million physical SLOC and would have cost $1.8 billion USD to develop traditionally. That's more than Red Hat; see . Top languages: C (71.12%), C++ (9.79%), LISP, Shell, Perl, Fotran, Tcl, Objective-C, Assembler, Ada, and Python in that order. My thanks to the authors! 2001-10-25 David A. Wheeler * Released sloccount 2.05. * Added support for detecting and counting PHP code. This was slightly tricky, because PHP's syntax has a few "gotchas" like "here document" strings, closing working even in C++ or sh style comments, and so on. Note - HTML files (.html, .htm, etc) are not examined for PHP code. You really shouldn't put a lot of PHP code in HTML documents, because it's a maintenance problem later anyway. The tool assigns every file a single type.. which is a problem, because HTML files could have multiple simultaneous embedded types (PHP, javascript, and HTML text). If the tool was modified to assign multiple languages to a single file, I'm not sure how to handle the file counts (counts of files for each language). For the moment, I just assign HTML to "html". * Modified output so that it adds a header before the language list. 2001-10-23 David A. Wheeler * Released sloccount 2.01 - a minor modification to support Cygwin users. * Modified compute_all to make it more portable (== became =); in particular this should help users using Cygwin. * Modified documentation to note that, if you install Cygwin, you HAVE to use Unix newlines (not DOS newlines) for the Cygwin install. Thanks to Mark Ericson for the bug report & for helping me track that down. * Minor cleanups to the ChangeLog. 2001-08-26 David A. Wheeler * Released sloccount 2.0 - it's getting a new version number because its internal data format changed. You'll have to re-analyze your system for the new sloccount to work. * Improved the heuristics to identify files (esp. .h files) as C, C++, or objective-C. The code now recognizes ".H" (as well as ".h") as header files. The code realizes that ".cpp" files that begin with .\" or ,\" aren't really C++ files - XFree86 stores many man pages with these extensions (ugh). * Added the ability to "--append" analyses. This means that you can analyze some projects, and then repeatedly add new projects. sloccount even stores and recovers md5 checksums, so it even detects duplicates across the projects (the "first" project gets the duplicate). * Added the ability to mark a data directory so that it's not erased (just create a file named "sloc_noerase" in the data directory). From then on, sloccount won't erase it until you remove the file. * Many changes made aren't user-visible. Completely re-organized break_filelist, which was getting incredibly baroque. I've improved the sloccount code so that adding new languages is much simpler; before, it required a number of changes in different places, which was bad. * SLOCCount now creates far fewer files, which is important for analyzing big systems (I was starting to run out of inodes when analyzing entire GNU/Linux distributions). Previous versions created stub files in every child directory for every possible language, even those that weren't used; since most projects only use a few languages, this was costly in terms of inodes. Also, the totals for each language for a given child directory are now in a single file (all-physical.sloc) instead of being in separate files; this not only reduces inode counts, but it also greatly simplifies later processing & eliminated a bug (now, to process all physical SLOC counts in a given child directory, just process that one file). 2001-06-22 David A. Wheeler * Per Prabhu Ramachandran's suggestion, recognize ".H" files as ".h"/".hpp" files (note the upper case). 2001-06-20 David A. Wheeler * Released version 1.9. This eliminates installation errors with "sql_count" and "makefile_count", detects PostgreSQL embedded C (in addition to Oracle and Informix), improves detection of Pascal code, and includes support for analyzing licenses (if a directory has the file PROGRAM_LICENSE, the file's contents are assumed to have the license name for that top-level program). It eliminates a portability problem, so hopefully it'll be easier to run it on Unix-like systems. It _still_ requires the "md5sum" program to run. 2001-06-14 David A. Wheeler * Changed the logic in make_filelists. This version doesn't require a "-L" option to test which GNU programs supported but which others (e.g., Solaris) didn't. It still doesn't normally follow symlinks. Not following subordinate symlinks is important for handling oddities such as pine's build directory /usr/src/redhat/BUILD/pine4.33/ldap in Red Hat 7.1, which includes symlinks to directories not actually inside the package at all (/usr/include and /usr/lib). * Added display of licenses in the summary form, if license information is available. * Added undocumented programs rpm_unpacker and extract_license. These are not installed at this time, they're just provided as a useful starting point if someone wants them. 2001-06-12 David A. Wheeler * Added support for license counting. If the top directory of a program has a file named "PROGRAM_LICENSE", it's copied to the .slocdata entry, and it's reported as part of a licensing total. Note that the file LICENSE is ignored, that's often more complex. 2001-06-08 David A. Wheeler * Fixed RPM spec file - it accidentally didn't install makefile_count and sql_count. This would produce spurious errors and inhibited the option of counting makefiles and SQL. Also fixed the makefile to include sql_count in the executable list. 2001-05-16 David A. Wheeler * Added support for auto-detecting ".pgc" files, which are embedded PostgreSQL - they are assumed to be C files (they COULD be C++ instead; while this will affect categorization it won't affect final SLOC counts). Also, if there's a ".c" with a corresponding ".pgc" file, the ".c" file is assumed to be auto-generated. * Thus, SLOCCount now supports embedded database commands for Oracle, Informix, and PostgreSQL. MySQL doesn't use an "embedded" approach, but uses a library approach that SLOCCount could already handle. * Fixed documentation: HTML reserved characters misused, sql_count undocumented. 2001-05-14 David A. Wheeler * Added modifications from Gordon Hart to improve detection of Pascal source code files. Pascal files which only have a "unit" in them (not a full program), or have "interface" or "implementation", are now detected as Pascal programs. The original Pascal specification didn't support units, but there are Pascal programs which use them. This should result in more accurate counts of Pascal software that uses units. He also reminded me that Pascal is case-insensitive, spurring a modification in the detection routines (for those who insist on uppercase keywords.. a truly UGLY format, but we need to support it to correctly identify such source code as Pascal). * Modified the documentation to note that I prefer unified diffs. I also added a reference to the TODO file, and from here on I'll post the TODO file separately on my web site. 2001-05-02 David A. Wheeler * Released version 1.8. Added several features to support measuring programs with embedded database commands. This includes suporting many Oracle & Informix embedded file types (.pc, .pcc, .pad, .ec, .ecp). It also optionally counts SQL files (.sql) and makefiles (makefile, Makefile, etc.), though by default they are NOT included in lines-of-code counts. See the (new) TODO file for limitations on makefile identification. 2001-04-30 David A. Wheeler * Per suggestion from Gary Myer, added optional "--addlang" option to add languages not NORMALLY counted. Currently it only supports "makefile" and "sql". The scheme for detecting automatically generated makefiles could use improvement. Normally, makefiles and sql won't be counted in the final reports, but the front-end will make the calculations and if requested their values will be provided. * Added an "SQL" counter and a "makefile" counter. * Per suggestions from Gary Myer, added detection for files where database commands (Oracle and Informix) are embedded in the code: .pc -> Oracle Preprocessed C code .pcc -> Oracle preprocessed C++ Code .pad -> Oracle preprocessed Ada Code .ec -> Informix preprocessed C code .ecp -> Informix preprocessed C code which calls the C preprocessor before calling the Informix preprocessor. Handling ".pc" has heuristics, since many use ".pc" to mean "stuff about PCs". Certain filenames not counted as C files (e.g., "makefile.pc" and "README.pc") if they end in ".pc". Note that if you stick C++ code into .pc files, it's counted as C. These embedded files are normal source files of the respective language, with database commands stuck into them, e.g., EXEC SQL select FIELD into :variable from TABLE; which performs a select statement and puts the result into the variable. The database preprocessor simply reads this file, and converts all "EXEC SQL" statements into the appropriate calls and outputs a normal program. Currently the "automatically generated" detectors don't detect this case. For the moment, just make sure the generated files aren't around while running SLOCCount. Currently the following are not handled (future release?): .pco -> Oracle preprocessed Cobol Code .pfo -> Oracle preprocessed Fortran Code I don't have a Cobol counter. The Fortran counter only works for f77, and I doubt .pfo is limited to that. 2001-04-27 David A. Wheeler * Per suggestions from Gary Myer, added ".a" and ".so" to the "not" list, since these are libraries not source, and added the filename "Root" to the "not" file list ("Root" has special meaning to CVS). * Added a note about needing "md5sum" (Gary Myer) * Added a TODO file. If something's on the TODO list that you'd like, please write the code and send it in. * Noted that running on Cygwin is MUCH slower than when running on Linux. Truth in advertizing is only fair. 2001-04-26 David A. Wheeler * Release version 1.6: the big change is support for running on Windows. Windows users must install Cygwin first. * Modified makefile so that SLOCCount can run on Windows systems if "Cygwin" is installed. The basic modifications to do this were developed by John Clezy -- Thanks!!! I spent time merging his makefile and mine so that a single makefile could be used on both Windows and Unix. * Documented how to install and run SLOCCount on Windows using cygwin. * Changed default prefix to /usr/local; you can set PREFIX to change this, e.g., "make PREFIX=/usr". * When counting a single project, sloccount now also reports "Estimated average number of developers", which is simply the person-months divided by months. As with all estimates, take it with an ocean of salt. This isn't reported for multiproject queries; properly doing this would require "packing" to compensate for the fact that small projects complete before large ones if started simultaneously. * Improved man page (fixed a typo, etc.). 2001-01-10 David A. Wheeler * Released version 1.4. This is an "ease of use" release, greatly simplifying the installation and use of SLOCCount. The new front-end tool "sloccount" does all the work in one step - now just type "sloccount DIRECTORY" and it's all counted. An RPM makes installation trivial for RPM-based systems. A man page is now available. There are now rules for "make install" and "make uninstall" too. Other improvements include a schedule estimator and options to control the effort and schedule estimators. 2001-01-07 David A. Wheeler * Added an estimator of schedule as well as effort. * Added various options to control the effort and cost estimation: "--effort", "--personcost", "--overhead", and "--schedule". Now people can (through options) control the assumptions made in the effort and cost estimations from the command line. The output now shows the effort estimation model used. * Changed the output slightly to pretty it up and note that it's development EFFORT not TIME that is shown. * Added a note at bottom asking for credit. I don't ask for any money, but I'd like some credit if you refer to the data the tool generates; a gentle reminder in the output seemed like the easiest way to ask for this credit. * Created an RPM package; now RPM-based systems can EASILY install it. It's a relocatable package, so hopefully "alien" can easily translate it to other formats (such as Debian's .deb format). * Created a "man" page for sloccount. 2001-01-06 David A. Wheeler * Added front-end tool "sloccount", GREATLY improving ease-of-use. The tool "sloccount" invokes all the other SLOCCount tools in the right order, performing a count of a typical project or set of projects. From now on, this is expected to be the "usual" interface, though the pieces will still be documented to help those with more unusual needs. From now on, "SLOCCount" is the entire package, and "sloccount" is this front-end tool. * Added "--datadir" option to make_filelists (to support "sloccount"). * get_sloc: No longer displays languages with 0 counts. * Documentation: documented "sloccount"; this caused major changes, since "sloccount" is now the recommended interface for all but those with complicated requirements. * compute_filecount: minor optimization/simplication 2001-01-05 David A. Wheeler * Released vesion 1.2. * Changed the name of many programs, as part of a general clean-up. I changed "compute_all" to "compute_sloc", and eliminated most of the other "compute_*" files (replacing it with "compute_sloc_lang"). I also changed "get_data" to "get_sloc". This is part of a general clean-up, so that if someone wants to package this program for installation they don't have a thousand tiny programs polluting the namespace. Adding "sloc" to the names makes namespace collisions less likely. I also worked to make the program simpler. * Made a number of documentation fixes - my thanks to Clyde Roby for giving me feedback. * Changed all "*_count" programs to consistently print at the end "Total:" on a line by itself, followed on the next line by the total lines of code all by itself. This makes the new program get_sloc_detail simpler to implement, and also enables get_sloc_detail to perform some error detection. * Changed name of compressed file to ".tar.gz" and modified docs appropriately. The problem is a bug in Netscape 4.7 clients running on Windows; it appears that ".tgz" files don't get fully downloaded from my hosting webserver because no type information is provided. Originally, I tried to change the website to fix this by creating ".htaccess" files, but that didn't work with either: AddEncoding x-gzip gz tgz AddType application/x-tar .tgz or: AddEncoding application/octet-stream tgz So, we'll switch to .tar.gz, which works. My thanks to Christopher Lott for this feedback. * Removed a few garbage files. * Added information to documentation on how to handle HUGE sets of data directory children, i.e., where you can't even use "*" to list the data directory children. I don't have a directory of that kind of scale, so I can't test it directly, but I can at least discuss how to do it; it SHOULD work. * Changed makefile so that "ChangeLog" is now visible on the web. 2001-01-04 David A. Wheeler * Minor fixes to documentation. * Added "--crossdups" option to break_filelist. * Documented count_unknown_ext. * Created new tool, "get_sloc_detail", and documented it. Now you can get a complete report of all the SLOC data in one big file (e.g., for exporting to another tool for analysis). 2001-01-03 David A. Wheeler * First public release, version "1.0", of "SLOCCount". Main website: http://www.dwheeler.com/sloccount