summaryrefslogtreecommitdiff
path: root/sloccount.html
blob: 233ae9ad3012c64bb2fea6fcf686886772edec23 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>SLOCCount User's Guide</title>
</head>
<body bgcolor="#FFFFFF">
<center>
<font size="+3"><b><span class="title">SLOCCount User's Guide</span></b></font>
<br>
<font size="+2"><span class="author">by David A. Wheeler (dwheeler, at, dwheeler.com)</span></font>
<br>
<font size="+2"><span class="pubdate">August 1, 2004</span></font>
<br>
<font size="+2"><span class="version">Version 2.26</span></font>
</center>
<p>
<h1><a name="introduction">Introduction</a></h1>
<p>
SLOCCount (pronounced "sloc-count") is a suite of programs for counting
physical source lines of code (SLOC) in potentially large software systems.
Thus, SLOCCount is a "software metrics tool" or "software measurement tool".
SLOCCount was developed by David A. Wheeler,
originally to count SLOC in a GNU/Linux distribution, but it can be
used for counting the SLOC of arbitrary software systems.
<p>
SLOCCount is known to work on Linux systems, and has been tested
on Red Hat Linux versions 6.2, 7, and 7.1.
SLOCCount should run on many other Unix-like systems (if Perl is installed),
in particular, I would expect a *BSD system to work well.
Windows users can run sloccount by first installing
<a href="http://sources.redhat.com/cygwin">Cygwin</a>.
SLOCCount is much slower on Windows/Cygwin, and it's not as easy to install
or use on Windows, but it works.
Of course, feel free to upgrade to an open source Unix-like system
(such as Linux or *BSD) instead :-).
<p>
SLOCCount can count physical SLOC for a wide number of languages.
Listed alphabetically, they are
Ada, Assembly (for many machines and assemblers),
awk (including gawk and nawk),
Bourne shell (and relatives such as bash, ksh, zsh, and pdksh),
C, C++, C# (also called C-sharp or cs), C shell (including tcsh),
COBOL, Expect, Fortran (including Fortran 90), Haskell,
Java, lex (including flex),
LISP (including Scheme),
makefiles (though they aren't usually shown in final reports),
Modula3, Objective-C, Pascal, Perl, PHP, Python, Ruby, sed,
SQL (normally not shown),
TCL, and Yacc.
It can gracefully handle awkward situations in many languages,
for example, it can determine the
syntax used in different assembly language files and adjust appropriately,
it knows about Python's use of string constants as comments, and it
can handle various Perl oddities (e.g., perlpods, here documents,
and Perl's _&nbsp;_END_&nbsp;_ marker).
It even has a "generic" SLOC counter that you may be able to use count the
SLOC of other languages (depending on the language's syntax).
<p>
SLOCCount can also take a large list of files and automatically categorize
them using a number of different heuristics.
The heuristics automatically determine if a file
is a source code file or not, and if so, which language it's written in.
For example,
it knows that ".pc" is usually a C source file for an Oracle preprocessor,
but it can detect many circumstances where it's actually a file about
a "PC" (personal computer).
For another example, it knows that ".m" is the standard extension for
Objective-C, but it will check the file contents to
see if really is Objective-C.
It will even examine file headers to attempt to accurately determine
the file's true type.
As a result, you can analyze large systems completely automatically.
<p>
Finally, SLOCCount has some report-generating tools
to collect the data generated,
and then present it in several different formats and sorted different ways.
The report-generating tool can also generate simple tab-separated files
so data can be passed on to other analysis tools (such as spreadsheets
and database systems).
<p>
SLOCCount will try to quickly estimate development time and effort given only
the lines of code it computes, using the original Basic COCOMO model.
This estimate can be improved if you can give more information about the project.
See the
<a href="#cocomo">discussion below about COCOMO, including intermediate COCOMO</a>,
if you want to improve the estimates by giving additional information about
the project.
<p>
SLOCCount is open source software/free software (OSS/FS),
released under the GNU General Public License (GPL), version 2;
see the <a href="#license">license below</a>.
The master web site for SLOCCount is
<a href="http://www.dwheeler.com/sloccount">http://www.dwheeler.com/sloccount</a>.
You can learn a lot about SLOCCount by reading the paper that caused its
creation, available at
<a href="http://www.dwheeler.com/sloc">http://www.dwheeler.com/sloc</a>.
Feel free to see my master web site at
<a href="http://www.dwheeler.com">http://www.dwheeler.com</a>, which has
other material such as the
<a href="http://www.dwheeler.com/secure-programs"><i>Secure Programming
for Linux and Unix HOWTO</i></a>,
my <a href="http://www.dwheeler.com/oss_fs_refs.html">list of
OSS/FS references</a>, and my paper
<a href="http://www.dwheeler.com/oss_fs_why.html"><i>Why OSS/FS? Look at
the Numbers!</i></a>
Please send improvements by email
to dwheeler, at, dwheeler.com (DO NOT SEND SPAM - please remove the
commas, remove the spaces, and change the word "at" into the at symbol).
<p>
The following sections first give a "quick start"
(discussing how to use SLOCCount once it's installed),
discuss basic SLOCCount concepts,
how to install it, how to set your PATH,
how to install source code on RPM-based systems if you wish, and
more information on how to use the "sloccount" front-end.
This is followed by material for advanced users:
how to use SLOCCount tools individually (for when you want more control
than the "sloccount" tool gives you), designer's notes,
the definition of SLOC, and miscellaneous notes.
The last sections states the license used (GPL) and gives
hints on how to submit changes to SLOCCount (if you decide to make changes
to the program).


<p>
<h1><a name="quick-start">Quick Start</a></h1>
<p>
Once you've installed SLOCCount (discussed below),
you can measure an arbitrary program by typing everything
after the dollar sign into a terminal session:
<pre>
  $  sloccount <i>topmost-source-code-directory</i>
</pre>
<p>
The directory listed and all its descendants will be examined.
You'll see output while it calculates,
culminating with physical SLOC totals and
estimates of development time, schedule, and cost.
If the directory contains a set of directories, each of which is
a different project developed independently,
use the "--multiproject" option so the effort estimations
can correctly take this into account.
<p>
You can redisplay the data different ways by using the "--cached"
option, which skips the calculation stage and re-prints previously
computed information.
You can use other options to control what's displayed:
"--filecount" shows counts of files instead of SLOC, and
"--details" shows the detailed information about every source code file.
So, to display all the details of every file once you've previously
calculated the results, just type:
<pre>
  sloccount --cached --details
</pre>
<p>
You'll notice that the default output ends with a request.
If you use this data (e.g., in a report), please
credit that data as being "generated using 'SLOCCount' by David A. Wheeler."
I make no money from this program, so at least please give me some credit.
<p>
SLOCCount tries to ignore all automatically generated files, but its
heuristics to detect this are necessarily imperfect (after all, even humans
sometimes have trouble determining if a file was automatically genenerated).
If possible, try to clean out automatically generated files from
the source directories -- 
in many situations "make clean" does this.
<p>
There's more to SLOCCount than this, but first we'll need to
explain some basic concepts, then we'll discuss other options
and advanced uses of SLOCCount.

<p>
<h1><a name="concepts">Basic Concepts</a></h1>
<p>
SLOCCount counts physical SLOC, also called "non-blank, non-comment lines".
More formally, physical SLOC is defined as follows:
``a physical source line of code (SLOC) is a line ending
in a newline or end-of-file marker,
and which contains at least one non-whitespace non-comment character.''
Comment delimiters (characters other than newlines starting and ending
a comment) are considered comment characters.
Data lines only including whitespace
(e.g., lines with only tabs and spaces in multiline strings) are not included.
<p>
In SLOCCount, there are 3 different directories:
<ol>
<li>The "source code directory", a directory containing the source code
   being measured
   (possibly in recursive subdirectories).  The directories immediately
   contained in the source code directory will normally be counted separately,
   so it helps if your system is designed so that this top set of directories
   roughly represents the system's major components.
   If it doesn't, there are various tricks you can use to group source
   code into components, but it's more work.
   You don't need write access to the source code directory, but
   you do need read access to all files, and read and search (execute) access
   to all subdirectories.
<li>The "bin directory", the directory containing the SLOCCount executables.
   By default, installing the program creates a subdirectory
   named "sloccount-VERSION" which is the bin directory.
   The bin directory must be part of your PATH.
<li>The "data directory", which stores the analysis results.
   When measuring programs using "sloccount", by default
   this is the directory ".slocdata" inside your home directory.
   When you use the advanced SLOCCount tools directly,
   in many cases this must be your "current" directory.
   Inside the data directory are "data directory children" - these are
   subdirectories that contain a file named "filelist", and each child
   is used to represent a different project or a different
   major component of a project.
</ol>
<p>
SLOCCount can handle many different programming languages, and separate
them by type (so you can compare the use of each).
Here is the set of languages, sorted alphabetically;
common filename extensions are in
parentheses, with SLOCCount's ``standard name'' for the language
listed in brackets:
<ol>
<li>Ada (.ada, .ads, .adb, .pad) [ada]
<li>Assembly for many machines and assemblers (.s, .S, .asm) [asm]
<li>awk (.awk) [awk]
<li>Bourne shell and relatives such as bash, ksh, zsh, and pdksh (.sh) [sh]
<li>C (.c, .pc, .ec, .ecp) [ansic]
<li>C++  (.C, .cpp, .cxx, .cc, .pcc) [cpp]
<li>C# (.cs) [cs]
<li>C shell including tcsh (.csh) [csh]
<li>COBOL (.cob, .cbl, .COB, .CBL) [cobol]
<li>Expect (.exp) [exp]
<li>Fortran 77 (.f, .f77, .F, .F77) [fortran]
<li>Fortran 90 (.f90, .F90) [f90]
<li>Haskell (.hs, .lhs) [haskell]; deals with both types of literate files.
<li>Java (.java) [java]
<li>lex (.l) [lex]
<li>LISP including Scheme (.cl, .el, .scm, .lsp, .jl) [lisp]
<li>makefiles (makefile) [makefile]
<li>ML (.ml, .ml3) [ml]
<li>Modula3 (.m3, .mg, .i3, .ig) [modula3]
<li>Objective-C (.m) [objc]
<li>Pascal (.p, .pas) [pascal]
<li>Perl (.pl, .pm, .perl) [perl]
<li>PHP (.php, .php[3456], .inc) [php]
<li>Python (.py) [python]
<li>Ruby (.rb) [ruby]
<li>sed (.sed) [sed]
<li>sql (.sql) [sql]
<li>TCL (.tcl, .tk, .itk) [tcl]
<li>Yacc (.y) [yacc]
</ol>

<p>
<h1><a name="installing">Installing SLOCCount</a></h1>
<p>
Obviously, before using SLOCCount you'll need to install it.
SLOCCount depends on other programs, in particular perl, bash,
a C compiler (gcc will do), and md5sum
(you can get a useful md5sum program in the ``textutils'' package
on many Unix-like systems), so you'll need to get them installed
if they aren't already.
<p>
If your system uses RPM version 4 or greater to install software
(e.g., Red Hat Linux 7 or later), just download the SLOCCount RPM
and install it using a normal installation command; from the text line
you can use:
<pre>
  rpm -Uvh sloccount*.rpm
</pre>
<p>
Everyone else will need to install from a tar file, and Windows users will
have to install Cygwin before installing sloccount.
<p>
If you're using Windows, you'll need to first install
<a href="http://sources.redhat.com/cygwin">Cygwin</a>.
By installing Cygwin, you'll install an environment and a set of
open source Unix-like tools.
Cygwin essentially creates a Unix-like environment in which sloccount can run.
You may be able to run parts of sloccount without Cygwin, in particular,
the perl programs should run in the Windows port of Perl, but you're
on your own - many of the sloccount components expect a Unix-like environment.
If you want to install Cygwin, go to the
<a href="http://sources.redhat.com/cygwin">Cygwin main page</a>
and install it.
If you're using Cygwin, <b>install it to use Unix newlines, not
DOS newlines</b> - DOS newlines will cause odd errors in SLOCCount
(and probably other programs, too).
I have only tested a "full" Cygwin installation, so I suggest installing
everything.
If you're short on disk space,  at least install
binutils, bash, fileutils, findutils,
gcc, grep, gzip, make, man, perl, readline,
sed, sh-utils, tar, textutils, unzip, and zlib;
you should probably install vim as well,
and there may be other dependencies as well.
By default Cygwin will create a directory C:\cygwin\home\NAME,
and will set up the ability to run Unix programs
(which will think that the same directory is called /home/NAME).
Now double-click on the Cygwin icon, or select from the Start menu
the selection Programs / Cygnus Solutions / Cygwin Bash shell;
you'll see a terminal screen with a Unix-like interface.
Now follow the instructions (next) for tar file users.
<p>
If you're installing from the tar file, download the file
(into your home directory is fine).
Unpacking the file will create a subdirectory, so if you want the
unpacked subdirectory to go somewhere special, "cd" to where you
want it to go.
Most likely, your home directory is just fine.
Now gunzip and untar SLOCCount (the * replaces the version #) by typing
this at a terminal session:
<pre>
  gunzip -c sloccount*.tar.gz | tar xvf -
</pre>
Replace "sloccount*.tar.gz" shown above
with the full path of the downloaded file, wherever that is.
You've now created the "bin directory", which is simply the
"sloccount-VERSION" subdirectory created by the tar command
(where VERSION is the version number).
<p>
Now you need to compile the few compiled programs in the "bin directory" so
SLOCCount will be ready to go.
First, cd into the newly-created bin directory, by typing:
<pre>
  cd sloccount*
</pre>
<p>
You may then need to override some installation settings.
You can can do this by editing the supplied makefile, or alternatively,
by providing options to "make" whenever you run make.
The supplied makefile assumes your C compiler is named "gcc", which
is true for most Linux systems, *BSD systems, and Windows systems using Cygwin.
If this isn't true, you'll need to set
the "CC" variable to the correct value (e.g., "cc").
You can also modify where the files are stored; this variable is
called PREFIX and its default is /usr/local
(older versions of sloccount defaulted to /usr).
<p>
If you're using Windows and Cygwin, you
<b>must</b> override one of the installation
settings, EXE_SUFFIX, for installation to work correctly.
One way to set this value is to edit the "makefile" file so that
the line beginning with "EXE_SUFFIX" reads as follows:
<pre>
  EXE_SUFFIX=.exe
</pre>
If you're using Cygwin and you choose to modify the "makefile", you
can use any text editor on the Cygwin side, or you can use a
Windows text editor if it can read and write Unix-formatted text files.
Cygwin users are free to use vim, for example.
If you're installing into your home directory and using the default locations,
Windows text editors will see the makefile as file
C:\cygwin\home\NAME\sloccount-VERSION\makefile.
Note that the Windows "Notepad" application doesn't work well, because it's not
able to handle Unix text files correctly.
Since this can be quite a pain, Cygus users may instead decide to override
make the makefile values instead during installation.
<p>
Finally, compile the few compiled programs in it by typing "make":
<pre>
  make
</pre>
If you didn't edit the makefile in the previous step, you
need to provide options to make invocations to set the correct values.
This is done by simply saying (after "make") the name of the variable,
an equal sign, and its correct value.
Thus, to compile the program on a Windows system using Cygus, you can
skip modifying the makefile file by typing this instead of just "make":
<pre>
  make EXE_SUFFIX=.exe
</pre>
<p>
If you want, you can install sloccount for system-wide use without
using the RPM version.
Windows users using Cygwin should probably do this, particularly
if they chose a "local" installation.
To do this, first log in as root (Cygwin users don't need to do this
for local installation).
Edit the makefile to match your system's conventions, if necessary,
and then type "make install":
<pre>
  make install
</pre>
If you need to set some make options, remember to do that here too.
If you use "make install", you can uninstall it later using
"make uninstall".
Installing sloccount for system-wide use is optional;
SLOCCount works without a system-wide installation.
However, if you don't install sloccount system-wide, you'll need to
set up your PATH variable; see the section on
<a href="#path">setting your path</a>.
<p>
A note for Cygwin users (and some others): some systems, including Cygwin,
don't set up the environment quite right and thus can't display the manual
pages as installed.
The problem is that they forget to search /usr/local/share/man for
manual pages.
If you want to read the installed manual pages, type this
into a Bourne-like shell:
<pre>
  MANPATH=/usr/local/share/man:/usr/share/man:/usr/man
  export MANPATH
</pre>
Or, if you use a C shell:
<pre>
  setenv MANPATH "/usr/local/share/man:/usr/share/man:/usr/man"
</pre>
From then on, you'll be able to view the reference manual pages
by typing "man sloccount" (or by using whatever manual page display system
you prefer).
<p>

<p>
<h1><a name="installing-source">Installing The Source Code To Measure</a></h1>
<p>
Obviously, you must install the software source code you're counting,
so somehow you must create the "source directory"
with the source code to measure.
You must also make sure that permissions are set so the software can
read these directories and files.
<p>
For example, if you're trying to count the SLOC for an RPM-based Linux system,
install the software source code by doing the following as root
(which will place all source code into the source directory
/usr/src/redhat/BUILD):
<ol>
<li>Install all source rpm's:
<pre>
    mount /mnt/cdrom
    cd /mnt/cdrom/SRPMS
    rpm -ivh *.src.rpm
</pre>
<li>Remove RPM spec files you don't want to count:
<pre>
    cd ../SPECS
    (look in contents of spec files, removing what you don't want)
</pre>
<li>build/prep all spec files:
<pre>
    rpm -bp *.spec
</pre>
<li>Set permissions so the source files can be read by all:
<pre>
    chmod -R a+rX /usr/src/redhat/BUILD
</pre>
</ol>
<p>
Here's an example of how to download source code from an
anonymous CVS server.
Let's say you want to examine the source code in GNOME's "gnome-core"
directory, as stored at the CVS server "anoncvs.gnome.org".
Here's how you'd do that:
<ol>
<li>Set up site and login parameters:
<pre>
  export CVSROOT=':pserver:anonymous@anoncvs.gnome.org:/cvs/gnome'
</pre>
<li>Log in:
<pre>
  cvs login
</pre>
<li>Check out the software (copy it to your local directory), using
mild compression to save on bandwidth:
<pre>
  cvs -z3 checkout gnome-core
</pre>
</ol>
<p>
Of course, if you have a non-anonymous account, you'd set CVSROOT
to reflect this.  For example, to log in using the "pserver"
protocol as ACCOUNT_NAME, do:
<pre>
  export CVSROOT=':pserver:ACCOUNT_NAME@cvs.gnome.org:/cvs/gnome'
</pre>
<p>
You may need root privileges to install the source code and to give
another user permission to read it, but please avoid running the
sloccount program as root.
Although I know of no specific reason this would be a problem,
running any program as root turns off helpful safeguards.
<p>
Although SLOCCount tries to detect (and ignore) many cases where
programs are automatically generated, these heuristics are necessarily
imperfect.
So, please don't run any programs that generate other programs - just
do enough to get the source code prepared for counting.
In general you shouldn't run "make" on the source code, and if you have,
consider running "make clean" or "make really_clean" on the source code first.
It often doesn't make any difference, but identifying those circumstances
is difficult.
<p>
SLOCCount will <b>not</b> automatically uncompress files that are
compressed/archive files (such as .zip, .tar, or .tgz files).
Often such files are just "left over" old versions or files
that you're already counting.
If you want to count the contents of compressed files, uncompress them first.
<p>
SLOCCount also doesn't delve into files using "literate programming"
techniques, in part because there are too many incompatible formats
that implement it.
Thus, run the tools to extract the code from the literate programming files
before running SLOCCount.  Currently, the only exception to this rule is
Haskell.


<h1><a name="path">Setting your PATH</a></h1>
Before you can run SLOCCount, you'll need to make sure
the SLOCCount "bin directory" is in your PATH.
If you've installed SLOCCount in a system-wide location
such as /usr/bin, then you needn't do more; the RPMs and "make install"
commands essentially do this.
<p>
Otherwise, in Bourne-shell variants, type:
<pre>
    PATH="$PATH:<i>the directory with SLOCCount's executable files</i>"
    export PATH
</pre>
Csh users should instead type:
<pre>
    setenv PATH "$PATH:<i>the directory with SLOCCount's executable files</i>"
</pre>

<h1><a name="using-basics">Using SLOCCount: The Basics</a></h1>

Normal use of SLOCCount is very simple.
In a terminal window just type "sloccount", followed by a
list of the source code directories to count.
If you give it only a single directory, SLOCCount tries to be
a little clever and break the source code into
subdirectories for purposes of reporting:
<ol>
<li>if directory has at least
two subdirectories, then those subdirectories will be used as the
breakdown (see the example below).
<li>If the single directory contains files as well as directories
(or if you give sloccount some files as parameters), those files will
be assigned to the directory "top_dir" so you can tell them apart
from other directories.
<li>If there's a subdirectory named "src", then that subdirectory is again
broken down, with all the further subdirectories prefixed with "src_".
So if directory "X" has a subdirectory "src", which contains subdirectory
"modules", the program will report a separate count from "src_modules".
</ol>
In the terminology discussed above, each of these directories would become
"data directory children."
<p>
You can also give "sloccount" a list of directories, in which case the
report will be broken down by these directories
(make sure that the basenames of these directories differ).
SLOCCount normally considers all descendants of these directories,
though unless told otherwise it ignores symbolic links.
<p>
This is all easier to explain by example.
Let's say that we want to measure Apache 1.3.12 as installed using an RPM.
Once it's installed, we just type:
<pre>
 sloccount /usr/src/redhat/BUILD/apache_1.3.12
</pre>
The output we'll see shows status reports while it analyzes things,
and then it prints out:

<pre>
SLOC	Directory	SLOC-by-Language (Sorted)
24728   src_modules     ansic=24728
19067   src_main        ansic=19067
8011    src_lib         ansic=8011
5501    src_os          ansic=5340,sh=106,cpp=55
3886    src_support     ansic=2046,perl=1712,sh=128
3823    src_top_dir     sh=3812,ansic=11
3788    src_include     ansic=3788
3469    src_regex       ansic=3407,sh=62
2783    src_ap          ansic=2783
1378    src_helpers     sh=1345,perl=23,ansic=10
1304    top_dir         sh=1304
104     htdocs          perl=104
31      cgi-bin         sh=24,perl=7
0       icons           (none)
0       conf            (none)
0       logs            (none)


ansic:       69191 (88.85%)
sh:           6781 (8.71%)
perl:         1846 (2.37%)
cpp:            55 (0.07%)


Total Physical Source Lines of Code (SLOC)                   = 77873
Estimated Development Effort in Person-Years (Person-Months) = 19.36 (232.36)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Estimated Schedule in Years (Months)                         = 1.65 (19.82)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers  (Effort/Schedule)    = 11.72
Total Estimated Cost to Develop                              = $ 2615760
 (average salary = $56286/year, overhead = 2.4).

Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
</pre>
<p>
Interpreting this should be straightforward.
The Apache directory has several subdirectories, including "htdocs", "cgi-bin",
and "src".
The "src" directory has many subdirectories in it
("modules", "main", and so on).
Code files directly
contained in the main directory /usr/src/redhat/BUILD/apache_1.3.12
is labelled "top_dir", while
code directly contained in the src subdirectory is labelled "src_top_dir".
Code in the "src/modules" directory is labelled "src_modules" here.
The output shows each major directory broken
out, sorted from largest to smallest.
Thus, the "src/modules" directory had the most code of the directories,
24728 physical SLOC, all of it in C.
The "src/helpers" directory had a mix of shell, perl, and C; note that
when multiple languages are shown, the list of languages in that child
is also sorted from largest to smallest.
<p>
Below the per-component set is a list of all languages used,
with their total SLOC shown, sorted from most to least.
After this is the total physical SLOC (77,873 physical SLOC in this case).
<p>
Next is an estimation of the effort and schedule (calendar time)
it would take to develop this code.
For effort, the units shown are person-years (with person-months
shown in parentheses); for schedule, total years are shown first
(with months in parentheses).
When invoked through "sloccount", the default assumption is that all code is
part of a single program; the "--multiproject" option changes this
to assume that all top-level components are independently developed
programs.
When "--multiproject" is invoked, each project's efforts are estimated
separately (and then summed), and the schedule estimate presented
is the largest estimated schedule of any single component.
<p>
By default the "Basic COCOMO" model is used for estimating
effort and schedule; this model
includes design, code, test, and documentation time (both
user/admin documentation and development documentation).
<a href="#cocomo">See below for more information on COCOMO</a>
as it's used in this program.
<p>
Next are several numbers that attempt to estimate what it would have cost
to develop this program.
This is simply the amount of effort, multiplied by the average annual
salary and by the "overhead multiplier".
The default annual salary is
$56,286 per year; this value was from the
<i>ComputerWorld</i>, September 4, 2000's Salary Survey
of an average U.S. programmer/analyst salary in the year 2000.
You might consider using other numbers
(<i>ComputerWorld</i>'s September 3, 2001 Salary Survey found
an average U.S. programmer/analyst salary making $55,100, senior
systems programmers averaging $68,900, and senior systems analysts averaging
$72,300).

<p>
Overhead is much harder to estimate; I did not find a definitive source
for information on overheads.
After informal discussions with several cost analysts,
I determined that an overhead of 2.4
would be representative of the overhead sustained by
a typical software development company.
As discussed in the next section, you can change these numbers too.

<p>
You may be surprised by the high cost estimates, but remember,
these include design, coding, testing, documentation (both for users
and for programmers), and a wrap rate for corporate overhead
(to cover facilities, equipment, accounting, and so on).
Many programmers forget these other costs and are shocked by the high figures.
If you only wanted to know the costs of the coding, you'd need to get
those figures.


<p>
Note that if any top-level directory has a file named PROGRAM_LICENSE,
that file is assumed to contain the name of the license
(e.g., "GPL", "LGPL", "MIT", "BSD", "MPL", and so on).
If there is at least one such file, sloccount will also report statistics
on licenses.

<p>
Note: sloccount internally uses MD5 hashes to detect duplicate files,
and thus needs some program that can compute MD5 hashes.
Normally it will use "md5sum" (available, for example, as a GNU utility).
If that doesn't work, it will try to use "md5" and "openssl", and you may
see error messages in this format:
<pre>
 Can't exec "md5sum": No such file or directory at
     /usr/local/bin/break_filelist line 678, &lt;CODE_FILE&gt; line 15.
 Can't exec "md5": No such file or directory at
     /usr/local/bin/break_filelist line 678, &lt;CODE_FILE&gt; line 15.
</pre>
You can safely ignore these error messages; these simply show that
SLOCCount is probing for a working program to compute MD5 hashes.
For example, Mac OS X users normally don't have md5sum installed, but
do have md5 installed, so they will probably see the first error
message (because md5sum isn't available), followed by a note that a
working MD5 program was found.


<h1><a name="options">Options</a></h1>
The program "sloccount" has a large number of options
so you can control what is selected for counting and how the
results are displayed.
<p>
There are several options that control which files are selected
for counting:
<pre>
 --duplicates   Count all duplicate files as normal files
 --crossdups    Count duplicate files if they're in different data directory
                children.
 --autogen      Count automatically generated files
 --follow       Follow symbolic links (normally they're ignored)
 --addlang      Add languages to be counted that normally aren't shown.
 --append       Add more files to the data directory
</pre>
Normally, files which have exactly the same content are counted only once
(data directory children are counted alphabetically, so the child
"first" in the alphabet will be considered the owner of the master copy).
If you want them all counted, use "--duplicates".
Sometimes when you use sloccount, each directory represents a different
project, in which case you might want to specify "--crossdups".
The program tries to reject files that are automatically generated
(e.g., a C file generated by bison), but you can disable this as well.
You can use "--addlang" to show makefiles and SQL files, which aren't
usually counted.
<p>
Possibly the most important option is "--cached".
Normally, when sloccount runs, it computes a lot of information and
stores this data in a "data directory" (by default, "~/.slocdata").
The "--cached" option tells sloccount to use data previously computed,
greatly speeding up use once you've done the computation once.
The "--cached" option can't be used along with the options used to
select what files should be counted.
You can also select a different data directory by using the
"--datadir" option.
<p>
There are many options for controlling the output:
<pre>
 --filecount     Show counts of files instead of SLOC.
 --details       Present details: present one line per source code file.
 --wide          Show "wide" format.  Ignored if "--details" selected
 --multiproject  Assume each directory is for a different project
                 (this modifies the effort estimation calculations)
 --effort F E    Change the effort estimation model, so that it uses
                 F as the factor and E as the exponent.
 --schedule F E  Change the schedule estimation model, so that it uses
                 F as the factor and E as the exponent.
 --personcost P  Change the average annual salary to P.
 --overhead O    Change the annual overhead to O.
 --              End of options
</pre>
<p>
Basically, the first time you use sloccount, if you're measuring
a set of projects (not a single project) you might consider
using "--crossdups" instead of the defaults.
Then, you can redisplay data quickly by using "--cached",
combining it with options such as "--filecount".
If you want to send the data to another tool, use "--details".
<p>
If you're measuring a set of projects, you probably ought to pass
the option "--multiproject".
When "--multiproject" is used, efforts are computed for each component
separately and summed, and the time estimate used is the maximum
single estimated time.
<p>
The "--details" option dumps the available data in 4 columns,
tab-separated, where each line
represents a source code file in the data directory children identified.
The first column is the SLOC, the second column is the language type,
the third column is the name of the data directory child
(as it was given to get_sloc_details),
and the last column is the absolute pathname of the source code file.
You can then pipe this output to "sort" or some other tool for further
analysis (such as a spreadsheet or RDBMS).
<p>
You can change the parameters used to estimate effort using "--effort".
For example, if you believe that in the environment being used
you can produce 2 KSLOC/month scaling linearly, then
that means that the factor for effort you should use is 1/2 = 0.5 month/KSLOC,
and the exponent for effort is 1 (linear).
Thus, you can use "--effort 0.5 1".
<p>
You can also set the annual salary and overheads used to compute
estimated development cost.
While "$" is shown, there's no reason you have to use dollars;
the unit of development cost is the same unit as the unit used for
"--personcost".

<h1><a name="cocomo">More about COCOMO</a></h1>

<p>
By default SLOCCount uses a very simple estimating model for effort and schedule:
the basic COCOMO model in the "organic" mode (modes are more fully discussed below).
This model estimates effort and schedule, including design, code, test,
and documentation time (both user/admin documentation and development documentation).
Basic COCOMO is a nice simple model, and it's used as the default because
it doesn't require any information about the code other than the SLOC count
already computed.
<p>
However, basic COCOMO's accuracy is limited for the same reason -
basic COCOMO doesn't take a number of important factors into account.
If you have the necessary information, you can improve the model's accuracy
by taking these factors into account. You can at least quickly determine
if the right "mode" is being used to improve accuracy. You can also
use the "Intermediate COCOMO" and "Detailed COCOMO" models that take more
factors into account, and are likely to produce more accurate estimates as
a result. Take these estimates as just that - estimates - they're not grand truths.
If you have the necessary information,
you can improve the model's accuracy by taking these factors into account, and
pass this additional information to sloccount using its
"--effort" and "--schedule" options (as discussed in
<a href="#options">options</a>).
<p>
To use the COCOMO model, you first need to determine if your application's
mode, which can be "Organic", "embedded", or "semidetached".
Most software is "organic" (which is why it's the default).
Here are simple definitions of these modes:
<ul>
<li>Organic: Relatively small software teams develop software in a highly
familiar, in-house environment. &nbsp;It has a generally stable development
environment, minimal need for innovative algorithms, and requirements can
be relaxed to avoid extensive rework.</li>
<li>Semidetached: This is an intermediate
step between organic and embedded. This is generally characterized by reduced
flexibility in the requirements.</li>
<li>Embedded: The project must operate
within tight (hard-to-meet) constraints, and requirements
and interface specifications are often non-negotiable.
The software will be embedded in a complex environment that the
software must deal with as-is.</li>
</ul>
By default, SLOCCount uses the basic COCOMO model in the organic mode.
For the basic COCOMO model, here are the critical factors for --effort and --schedule:<br>
<ul>
<li>Organic: effort factor = 2.4, exponent = 1.05; schedule factor = 2.5, exponent = 0.38</li>
<li>Semidetached:  effort factor = 3.0, exponent = 1.12; schedule factor = 2.5, exponent = 0.35</li>
<li>Embedded:  effort factor = 3.6, exponent = 1.20; schedule factor = 2.5, exponent = 0.32</li>
</ul>
Thus, if you want to use SLOCCount but the project is actually semidetached,
you can use the options "--effort 3.0 1.12 --schedule 2.5 0.35"
to get a more accurate estimate.
<br>
For more accurate estimates, you can use the intermediate COCOMO models.
For intermediate COCOMO, use the following figures:<br>
<ul>
  <li>Organic: effort base factor = 2.3, exponent = 1.05; schedule factor = 2.5, exponent = 0.38</li>
  <li>Semidetached: effort base factor = 3.0, exponent = 1.12; schedule factor = 2.5, exponent = 0.35</li>
  <li>Embedded: effort base factor = 2.8, exponent = 1.20; schedule factor = 2.5, exponent = 0.32</li>
</ul>
The intermediate COCOMO values for schedule are exactly the same as the basic
COCOMO model; the starting effort values are not quite the same, as noted
in Boehm's book. However, in the intermediate COCOMO model, you don't
normally use the effort factors as-is, you use various corrective factors
(called cost drivers). To use these corrections, you consider
all the cost drivers, determine what best describes them,
and multiply their corrective values by the effort base factor.
The result is the final effort factor.
Here are the cost drivers (from Boehm's book, table 8-2 and 8-3):

<table cellpadding="2" cellspacing="2" border="1" width="100%">
  <tbody>
    <tr>
      <th rowspan="1" colspan="2">Cost Drivers
      </th>
      <th rowspan="1" colspan="6">Ratings
      </th>
    </tr>
    <tr>
      <th>ID
      </th>
      <th>Driver Name
      </th>
      <th>Very Low
      </th>
      <th>Low
      </th>
      <th>Nominal
      </th>
      <th>High
      </th>
      <th>Very High
      </th>
      <th>Extra High
      </th>
    </tr>
    <tr>
      <td>RELY
      </td>
      <td>Required software reliability
      </td>
      <td>0.75 (effect is slight inconvenience)
      </td>
      <td>0.88 (easily recovered losses)
      </td>
      <td>1.00 (recoverable losses)
      </td>
      <td>1.15 (high financial loss)
      </td>
      <td>1.40 (risk to human life)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>DATA
      </td>
      <td>Database size
      </td>
      <td>&nbsp;
      </td>
      <td>0.94 (database bytes/SLOC &lt; 10)
      </td>
      <td>1.00 (D/S between 10 and 100)
      </td>
      <td>1.08 (D/S between 100 and 1000)
      </td>
      <td>1.16 (D/S &gt; 1000)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>CPLX
      </td>
      <td>Product complexity
      </td>
      <td>0.70 (mostly straightline code, simple arrays, simple expressions)
      </td>
      <td>0.85
      </td>
      <td>1.00
      </td>
      <td>1.15
      </td>
      <td>1.30
      </td>
      <td>1.65 (microcode, multiple resource scheduling, device timing dependent coding)
      </td>
    </tr>
    <tr>
      <td>TIME
      </td>
      <td>Execution time constraint
      </td>
      <td>&nbsp;
      </td>
      <td>&nbsp;
      </td>
      <td>1.00 (&lt;50% use of available execution time)
      </td>
      <td>1.11 (70% use)
      </td>
      <td>1.30 (85% use)
      </td>
      <td>1.66 (95% use)
      </td>
    </tr>
    <tr>
      <td>STOR
      </td>
      <td>Main storage constraint
      </td>
      <td>&nbsp;
      </td>
      <td>&nbsp;
      </td>
      <td>1.00&nbsp;(&lt;50% use of available storage)</td>
      <td>1.06  (70% use)
      </td>
      <td>1.21 (85% use)
      </td>
      <td>1.56 (95% use)
      </td>
    </tr>
    <tr>
      <td>VIRT
      </td>
      <td>Virtual machine (HW and OS) volatility
      </td>
      <td>&nbsp;
      </td>
      <td>0.87 (major change every 12 months, minor every month)
      </td>
      <td>1.00 (major change every 6 months, minor every 2 weeks)</td>
      <td>1.15 (major change every 2 months, minor changes every week)
      </td>
      <td>1.30 (major changes every 2 weeks, minor changes every 2 days)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>TURN
      </td>
      <td>Computer turnaround time
      </td>
      <td>&nbsp;
      </td>
      <td>0.87 (interactive)
      </td>
      <td>1.00 (average turnaround &lt; 4 hours)
      </td>
      <td>1.07
      </td>
      <td>1.15
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>ACAP
      </td>
      <td>Analyst capability
      </td>
      <td>1.46 (15th percentile)
      </td>
      <td>1.19 (35th percentile)
      </td>
      <td>1.00 (55th percentile)
      </td>
      <td>0.86 (75th percentile)
      </td>
      <td>0.71 (90th percentile)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>AEXP
      </td>
      <td>Applications experience
      </td>
      <td>1.29 (&lt;= 4 months experience)
      </td>
      <td>1.13 (1 year)
      </td>
      <td>1.00 (3 years)
      </td>
      <td>0.91 (6 years)
      </td>
      <td>0.82 (12 years)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>PCAP
      </td>
      <td>Programmer capability
      </td>
      <td>1.42 (15th percentile)
      </td>
      <td>1.17  (35th percentile)
      </td>
      <td>1.00 (55th percentile)
      </td>
      <td>0.86  (75th percentile)
      </td>
      <td>0.70 (90th percentile)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>VEXP
      </td>
      <td>Virtual machine experience
      </td>
      <td>1.21 (&lt;= 1 month experience)
      </td>
      <td>1.10 (4 months)
      </td>
      <td>1.00 (1 year)
      </td>
      <td>0.90 (3 years)
      </td>
      <td>&nbsp;
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>LEXP
      </td>
      <td>Programming language experience
      </td>
      <td>1.14  (&lt;= 1 month experience)
      </td>
      <td>1.07 (4 months)
      </td>
      <td>1.00 (1 year)
      </td>
      <td>0.95 (3 years)
      </td>
      <td>&nbsp;
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>MODP
      </td>
      <td>Use of "modern" programming practices (e.g. structured programming)
      </td>
      <td>1.24 (No use)
      </td>
      <td>1.10
      </td>
      <td>1.00 (some use)
      </td>
      <td>0.91
      </td>
      <td>0.82 (routine use)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>TOOL
      </td>
      <td>Use of software tools
      </td>
      <td>1.24
      </td>
      <td>1.10
      </td>
      <td>1.00 (basic tools)
      </td>
      <td>0.91 (test tools)
      </td>
      <td>0.83 (requirements, design, management, documentation tools)
      </td>
      <td>&nbsp;
      </td>
    </tr>
    <tr>
      <td>SCED
      </td>
      <td>Required development schedule
      </td>
      <td>1.23 (75% of nominal)
      </td>
      <td>1.08 (85% of nominal)
      </td>
      <td>1.00 (nominal)
      </td>
      <td>1.04 (130% of nominal)
      </td>
      <td>1.10 (160% of nominal)
      </td>
      <td>&nbsp;
      </td>
    </tr>
  </tbody>
</table>
<br>
<br>
<br>
So, once all of the factors have been multiplied together, you can
then use the "--effort" flag to set more accurate factors and exponents.
Note that some factors will probably not be "nominal" simply because
times have changed since COCOMO was originally developed, so a few regions
that were desirable have become more common today.
For example,
for many software projects of today, virtual machine volatility tends to
be low, and the
use of "modern" programming practices (structured programming,
object-oriented programming, abstract data types, etc.) tends to be high.
COCOMO automatically handles these differences.
<p>
For example, imagine that you're examining a fairly simple application that
meets the "organic" requirements. Organic projects have a base factor
of 2.3 and exponents of 1.05, as noted above.
We then examine all the factors to determine a corrected base factor.
For this example, imagine
that we determine the values of these cost drivers are as follows:<br>
<br>
<table cellpadding="2" cellspacing="2" border="1" width="100%">

  <tbody>
    <tr>
      <td rowspan="1" colspan="2">Cost Drivers<br>
      </td>
      <td rowspan="1" colspan="2">Ratings<br>
      </td>
    </tr>
    <tr>
      <td>ID<br>
      </td>
      <td>Driver Name<br>
      </td>
      <td>Rating<br>
      </td>
      <td>Multiplier<br>
      </td>
    </tr>
    <tr>
      <td>RELY<br>
      </td>
      <td>Required software reliability<br>
      </td>
      <td>Low - easily recovered losses<br>
      </td>
      <td>0.88<br>
      </td>
    </tr>
    <tr>
      <td>DATA<br>
      </td>
      <td>Database size<br>
      </td>
      <td>Low<br>
      </td>
      <td>0.94<br>
      </td>
    </tr>
    <tr>
      <td>CPLX<br>
      </td>
      <td>Product complexity<br>
      </td>
      <td>Nominal<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>TIME<br>
      </td>
      <td>Execution time constraint<br>
      </td>
      <td>Nominal<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>STOR<br>
      </td>
      <td>Main storage constraint<br>
      </td>
      <td>Nominal<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>VIRT<br>
      </td>
      <td>Virtual machine (HW and OS) volatility<br>
      </td>
      <td>Low  (major change every 12 months, minor every month)<br>
      </td>
      <td>0.87<br>
      </td>
    </tr>
    <tr>
      <td>TURN<br>
      </td>
      <td>Computer turnaround time<br>
      </td>
      <td>Nominal<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>ACAP<br>
      </td>
      <td>Analyst capability<br>
      </td>
      <td>Nominal  (55th percentile)<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>AEXP<br>
      </td>
      <td>Applications experience<br>
      </td>
      <td>Nominal (3 years)<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>PCAP<br>
      </td>
      <td>Programmer capability<br>
      </td>
      <td>Nominal  (55th percentile)<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>VEXP<br>
      </td>
      <td>Virtual machine experience<br>
      </td>
      <td>High (3 years)<br>
      </td>
      <td>0.90<br>
      </td>
    </tr>
    <tr>
      <td>LEXP<br>
      </td>
      <td>Programming language experience<br>
      </td>
      <td>High (3 years)<br>
      </td>
      <td>0.95<br>
      </td>
    </tr>
    <tr>
      <td>MODP<br>
      </td>
      <td>Use of "modern" programming practices (e.g. structured programming)<br>
      </td>
      <td>High (Routine use)<br>
      </td>
      <td>0.82<br>
      </td>
    </tr>
    <tr>
      <td>TOOL<br>
      </td>
      <td>Use of software tools<br>
      </td>
      <td>Nominal (basic tools)<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    <tr>
      <td>SCED<br>
      </td>
      <td>Required development schedule<br>
      </td>
      <td>Nominal<br>
      </td>
      <td>1.00<br>
      </td>
    </tr>
    
    
    
    
  </tbody>
</table>
<p>
So, starting with the base factor (2.3 in this case), and then multiplying
the driver values, we'll compute a final factor of:
By multiplying these driver values together in this example, we compute:<br>
<pre>2.3*0.88*0.94*1*1*1*0.87*1.00*1*1*1*0.90*0.95*0.82*1*1</pre>
For this
example, the final factor for the effort calculation is 1.1605. You would then
invoke sloccount with "--effort 1.1605 1.05" to pass in the corrected factor
and exponent for the effort estimation.
You don't need to use "--schedule" to set the factors when you're using
organic model, because in SLOCCount
the default values are the values for the organic model.
You can set scheduling parameters manually
anyway by setting "--schedule 2.5 0.38".
You <i>do</i> need to use the --schedule option for
embedded and semidetached projects, because those modes have different
schedule parameters. The final command would be:<br>
<br>
sloccount --effort 1.1605 1.05 --schedule 2.5 0.38 my_project<br>
<p>
The detailed COCOMO model requires breaking information down further.
<p>
For more information about the original COCOMO model, including the detailed
COCOMO model, see the book
<i>Software Engineering Economics</i> by Barry Boehm.
<p>
You may be surprised by the high cost estimates, but remember,
these include design, coding, testing (including
integration and testing), documentation (both for users
and for programmers), and a wrap rate for corporate overhead
(to cover facilities, equipment, accounting, and so on).
Many programmers forget these other costs and are shocked by the high cost
estimates.
<p>
If you want to know a subset of this cost, you'll need to isolate
just those figures that you're trying to measure.
For example, let's say you want to find the money a programmer would receive
to do just the coding of the units of the program
(ignoring wrap rate, design, testing, integration, and so on).
According to Boehm's book (page 65, table 5-2),
the percentage varies by product size.
For effort, code and unit test takes 42% for small (2 KSLOC), 40% for
intermediate (8 KSLOC), 38% for medium (32 KSLOC), and 36% for large
(128 KSLOC).
Sadly, Boehm doesn't separate coding from unit test; perhaps
50% of the time is spent in unit test in traditional proprietary
development (including fixing bugs found from unit test).
If you want to know the income to the programmer (instead of cost to
the company), you'll also want to remove the wrap rate.
Thus, a programmer's income to <i>only</i> write the code for a
small program (circa 2 KSLOC) would be 8.75% (42% x 50% x (1/2.4)) 
of the default figure computed by SLOCCount.
<p>
In other words, less than one-tenth of the cost as computed by SLOCCount
is what actually would be made by a programmer for a small program for
just the coding task.
Note that a proprietary commercial company that bid using
this lower figure would rapidly go out of business, since this figure
ignores the many other costs they have to incur to actually develop
working products.
Programs don't arrive out of thin air; someone needs to determine what
the requirements are, how to design it, and perform at least
some testing of it.
<p>
There's another later estimation model for effort and schedule
called "COCOMO II", but COCOMO II requires logical SLOC instead
of physical SLOC.
SLOCCount doesn't currently measure logical SLOC, so
SLOCCount doesn't currently use COCOMO II.
Contributions of code to compute logical SLOC and then optionally
use COCOMO II will be gratefully accepted.

<h1><a name="specific-files">Counting Specific Files</a></h1>
<p>
If you want to count a specific subset, you can use the "--details"
option to list individual files, pipe this into "grep" to select the
files you're interested in, and pipe the result to
my tool "print_sum" (which reads lines beginning with numbers, and
returns the total of those numbers).
If you've already done the analysis, an example would be:
<pre>
  sloccount --cached --details | grep "/some/subdirectory/" | print_sum
</pre>
<p>
If you just want to count specific files, and you know what language
they're in, you
can just invoke the basic SLOC counters directly.
By convention the simple counters are named "LANGUAGE_count",
and they take on the command line a list of the
source files to count.
Here are some examples:
<pre>
  c_count *.c *.cpp *.h  # Count C and C++ in current directory.
  asm_count *.S          # Count assembly.
</pre>
All the counters (*_count) program accept a &quot;-f FILENAME&quot; option, where FILENAME
is a file containing the names of all the source files to count
(one file per text line). If FILENAME is &quot;-&quot;, the
    list of file names is taken from the standard input.
The &quot;c_count&quot; program handles both C and C++ (but not objective-C;
for that use objc_count).
The available counters are
ada_count,
asm_count,
awk_count,
c_count,
csh_count,
exp_count,
fortran_count,
f90_count,
java_count,
lex_count,
lisp_count,
ml_count,
modula3_count,
objc_count,
pascal_count,
perl_count,
python_count,
sed_count,
sh_count,
sql_count, and
tcl_count.
<p>
There is also "generic_count", which takes as its first parameter
the ``comment string'', followed by a list of files.
The comment string begins a comment that ends at the end of the line.
Sometimes, if you have source for a language not listed, generic_count
will be sufficient.
<p>
The basic SLOC counters will send output to standard out, one line per file
(showing the SLOC count and filename).
The assembly counter shows some additional information about each file.
The basic SLOC counters always complete their output with a line
saying "Total:", followe by a line with the
total SLOC count.

<h1><a name="errors">Countering Problems and Handling Errors</a></h1>

If you're analyzing unfamiliar code, there's always the possibility
that it uses languages not processed by SLOCCount.
To counter this, after running SLOCCount, run the following program:
<pre>
 count_unknown_ext
</pre>
This will look at the resulting data (in its default data directory
location, ~/.slocdata) and report a sorted list of the file extensions
for uncategorized ("unknown") files.
The list will show every file extension and how many files had that
extension, and is sorted by most common first.
It's not a problem if an "unknown" type isn't a source code file, but
if there are a significant number of source files in this category,
you'll need to change SLOCCount to get an accurate result.

<p>
One error report that you may see is:
<pre>
  c_count ERROR - terminated in string in (filename)
</pre>

The cause of this is that c_count (the counter for C-like languages)
keeps track of whether or not it's in a string, and when the counter
reached the end of the file, it still thought it was in a string.

<p>
Note that c_count really does have to keep track of whether or
not it's a string.
For example, this is three lines of code, not two, because the
``comment'' is actually in string data:

<pre>
 a = "hello
 /* this is not a comment */
 bye";
</pre>
<p>
Usually this error means you have code that won't compile
given certain #define settings.  E.G., XFree86 has a line of code that's
actually wrong (it has a string that's not terminated), but people
don't notice because the #define to enable it is not usually set.
Legitimate code can trigger this message, but code that triggers
this message is horrendously formatted and is begging for problems.

<p>
In either case, the best way to handle the situation
is to modify the source code (slightly) so that the code's intent is clear
(by making sure that double-quotes balance).
If it's your own code, you definitely should fix this anyway.
You need to look at the double-quote (") characters.  One approach is to
just grep for double-quote, and look at every line for text that isn't
terminated, e.g., printf("hello %s, myname);

<p>
SLOCcount reports warnings when an unusually
large number of duplicate files are reported.
A large number of duplicates <i>may</i> suggest that you're counting
two different versions of the same program as though they were
independently developed.
You may want to cd into the data directory (usually ~/.slocdata), cd into
the child directories corresponding to each component, and then look
at their dup_list.dat files, which list the filenames that appeared
to be duplicated (and what they duplicate with).


<h1><a name="adding">Adding Support for New Languages</a></h1>
SLOCcount handles many languages, but if it doesn't support one you need,
you'll need to give the language a standard (lowercase ASCII) name,
then modify SLOCcount to (1) detect and (2) count code in that language.

<ol>
<li>
To detect a new language, you'll need to modify the program break_filelist.
If the filename extension is reliable, you can modify the array
%file_extensions, which maps various filename extensions into languages.
If your needs are more complex, you'll need to modify the code
(typically in functions get_file_type or file_type_from_contents)
so that the correct file type is determined.
For example, if a file with a given filename extension is only
<i>sometimes</i> that type, you'll need to write code to examine the
file contents.
<li>
You'll need to create a SLOC counter for that language type.
It must have the name XYZ_count, where XYZ is the standard name for the
language.
<p>
For some languages, you may be able to use the ``generic_count'' program
to implement your counter - generic_count takes as its first argument
the pattern which
identifies comment begins (which continue until the end of the line);
the other arguments are the files to count.
Thus, the LISP counter looks like this:
<pre>
 #!/bin/sh
 generic_count ';' $@
</pre>
The generic_count program won't work correctly if there are multiline comments
(e.g., C) or multiline string constants.
If your language is identical to C/C++'s syntax in terms of
string constant definitions and commenting syntax
(using // or /* .. */), then you can use the c_count program - in this case,
modify compute_sloc_lang so that the c_count program is used.
<p>
Otherwise, you'll have to devise your own counting program.
The program must generate files with the same format, e.g.,
for every filename passed as an argument, it needs to return separate lines,
where each line presents the SLOC
for that file, a space, and the filename.
(Note: the assembly language counter produces a slightly different format.)
After that, print "Total:" on its own line, and the actual SLOC total
on the following (last) line.
</ol>

<h1><a name="advanced">Advanced SLOCCount Use</a></h1>
For most people, the previous information is enough.
However, if you're measuring a large set of programs, or have unusual needs,
those steps may not give you enough control.
In that case, you may need to create your own "data directory"
by hand and separately run the SLOCCount tools.
Basically, "sloccount" (note the lower case) is the name for
a high-level tool which invokes many other tools; this entire
suite is named SLOCCount (note the mixed case).
The next section will describe how to invoke the various tools "manually"
so you can gain explicit control over the measuring process when
the defaults are not to your liking, along with various suggestions
for how to handle truly huge sets of data.
<p>
Here's how to manually create a "data directory" to hold
intermediate results, and how to invoke each tool in sequence
(with discussion of options):
<ol>
<li>Set your PATH to include the SLOCCount "bin directory", as discussed above.
<li>Make an empty "data directory"
(where all intermediate results will be stored);
you can pick any name and location you like for this directory.
Here, I'll use the name "data":
<pre>
    mkdir ~/data
</pre>
<li>Change your current directory to this "data directory":
<pre>
    cd ~/data
</pre>
The rest of these instructions assume that your current directory
is the data directory.
You can set up many different data directories if you wish, to analyze
different source programs or analyze the programs in different ways;
just "cd" to the one you want to work with.
<li>(Optional) Some of the later steps will produce
a lot of output while they're running.
If you want to capture this information into a file, use the standard
"script" command do to so.
For example, "script run1" will save the output of everything you do into
file "run1" (until you type control-D to stop saving the information).
Don't forget that you're creating such a file, or it will become VERY large,
and in particular don't type any passwords into such a session.
You can store the script in the data directory, or create a subdirectory
for such results - any data directory subdirectory that doesn't have the
special file "filelist" is not a "data directory child" and is thus
ignored by the later SLOCCount analysis routines.
<li>Now initialize the "data directory".
 In particular, initialization will create the "data directory children",
 a set of subdirectories equivalent to the source code directory's
 top directories.  Each of these data directory children (subdirectories)
 will contain a file named "filelist", which
 lists all filenames in the corresponding source code directory.
 These data directory children
 will also eventually contain intermediate results
 of analysis, which you can check for validity
 (also, having a cache of these values speeds later analysis steps).
 <p>
 You use the "make_filelists" command to initialize a data directory.
 For example, if your source code is in /usr/src/redhat/BUILD, run:
<pre>
   make_filelists /usr/src/redhat/BUILD/*
</pre>
<p>
 Internally, make_filelists uses "find" to create the list of files, and
 by default it ignores all symbolic links.  However, you may need to
 follow symbolic links; if you do, give make_filelists the
 "--follow" option (which will use find's "-follow" option).
 Here are make_filelists' options:
<pre>
 --follow         Follow symbolic links
 --datadir D      Use this data directory
 --skip S         Skip basenames named S
 --prefix P       When creating children, prepend P to their name.
 --               No more options
</pre>
<p>
 Although you don't normally need to do so, if you want certain files to
 not be counted at all in your analysis, you can remove
 data directory children or edit the "filelist" files to do so.
 There's no need to remove files which aren't source code files normally;
 this is handled automatically by the next step.
<p>
 If you don't have a single source code directory where the subdirectories
 represent the major components you want to count separately, you can
 still use the tool but it's more work.
 One solution is to create a "shadow" directory with the structure
 you wish the program had, using symbolic links (you must use "--follow"
 for this to work).
 You can also just invoke make_filelists multiple times, with parameters
 listing the various top-level directories you wish to include.
 Note that the basenames of the directories must be unique.
<p>
 If there are so many directories (e.g., a massive number of projects)
 that the command line is too long,
 you can run make_filelists multiple times in the same
 directory with different arguments to create them.
 You may find "find" and/or "xargs" helpful in doing this automatically.
 For example, here's how to do the same thing using "find":
<pre>
 find /usr/src/redhat/BUILD -maxdepth 1 -mindepth 1 -type d \
        -exec make_filelists {} \;
</pre>
<li>Categorize each file.
This means that we must determine which
files contain source code (eliminating auto-generated and duplicate files),
and of those files which language each file contains.
The result will be a set of files in each subdirectory of the data directory,
where each file represents a category (e.g., a language).
<pre>
   break_filelist *
</pre>
 At this point you might want to examine the data directory subdirectories
 to ensure that "break_filelist" has correctly determined the types of
 the various files.
 In particular, the "unknown" category may have source files in a language
 SLOCCount doesn't know about.
 If the heuristics got some categorization wrong, you can modify the
 break_filelist program and re-run break_filelist.
<p>
 By default break_filelist removes duplicates, doesn't count
 automatically generated files as normal source code files, and
 only gives some feedback.  You can change these defaults with the
 following options:
<pre>
 --duplicates   Count all duplicate files as normal files
 --crossdups    Count duplicate files if they're in different data directory
                children (i.e., in different "filelists")
 --autogen      Count automatically generated files
 --verbose      Present more verbose status information while processing.
</pre>
<p>
 Duplicate control in particular is an issue; you probably don't want
 duplicates counted, so that's the default.
 Duplicate files are detected by determining if their MD5 checksums
 are identical; the "first" duplicate encountered is the only one kept.
 Normally, since shells sort directory names, this means that the
 file in the alphabetically first child directory is the one counted.
 You can change this around by listing directories in the sort order you
 wish followed by "*"; if the same data directory child
 is requested for analysis more
 than once in a given execution, it's skipped after the first time.
 So, if you want any duplicate files with child directory "glibc" to 
 count as part of "glibc", then you should provide the data directory children
 list as "glibc *".
<p>
 Beware of choosing something other than "*" as the parameter here,
 unless you use the "--duplicates" or "--crossdups" options.
 The "*" represents the list of data directory children to examine.
 Since break_filelist skips duplicate files identified
 in a particular run, if you run break_filelist
 on only certain children, some duplicate files won't be detected.
 If you're allowing duplicates (via "--duplicates" or
 "--crossdups"), then this isn't a problem.
 Or, you can use the ``--duplistfile'' option to store and retrieve
 hashes of files, so that additional files can be handled.
<p>
 If there are so many directories that the command line is too long,
 you can run break_filelist multiple times and give it
 a subset of the directories each time.
 You'll need to use one of the duplicate control options to do this.
 I would suggest using "--crossdups", which
 means that duplicates inside a child will only be counted once,
 eliminating at least some of the problems of duplicates.
 Here's the equivalent of "break_filelist *" when there are a large
 number of subdirectories:
<pre>
 find . -maxdepth 1 -mindepth 1 -type d -exec break_filelist --crossdups {} \;
</pre>
 Indeed, for all of the later commands where "*" is listed as the parameter
 in these instructions
 (for the list of data directory children), just run the above "find"
 command and replace "break_filelist --crossdups" with the command shown.
<li>(Optional)
If you're not very familiar with the program you're analyzing, you
might not be sure that "break_filelist" has correctly identified
all of the files.
In particular, the system might be using an unexpected
programming language or extension not handled by SLOCCount.
If this is your circumstance, you can just run the command:
<pre>
 count_unknown_ext
</pre>
(note that this command is unusual - it doesn't take any arguments,
since it's hard to imagine a case where you wouldn't want every
directory examined).
Unlike the other commands discussed, this one specifically looks at
${HOME}/.slocdata.
This command presents a list of extensions which are unknown to break_filelist,
with the most common ones listed first.
The output format is a name, followed by the number of instances;
the name begins with a "." if it's an extension, or, if there's no
extension, it begins with "/" followed by the base name of the file.
break_filelist already knows about common extensions such as ".gif" and ".png",
as well as common filenames like "README".
You can also view the contents of each of the data directory children's
files to see if break_filelist has correctly categorized the files.
<li>Now compute SLOC and filecounts for each language; you can compute for all
 languages at once by calling:
<pre>
   compute_all *
</pre>
If you only want to compute SLOC for a specific language,
you can invoke compute_sloc_lang, which takes as its first parameter
the SLOCCount name of the language ("ansic" for C, "cpp" for C++,
"ada" for Ada, "asm" for assembly), followed by the list
of data directory children.
Note that these names are a change from version 1.0, which
called the master program "compute_all",
and had "compute_*" programs for each language.
<p>
Notice the "*"; you can replace the "*" with just the list of
data directory children (subdirectories) to compute, if you wish.
Indeed, you'll notice that nearly all of the following commands take a
list of data directory children as arguments; when you want all of them, use
"*" (as shown in these instructions), otherwise, list the ones you want.
<p>
When you run compute_all or compute_sloc_lang, each data directory
child (subdirectory)
is consulted in turn for a list of the relevant files, and the
SLOC results are placed in that data directory child.
In each child,
the file "LANGUAGE-outfile.dat" lists the information from the
basic SLOC counters.
That is, the oufile lists the SLOC and filename
(the assembly outfile has additional information), and ends with
a line saying "Total:" followed by a line showing the total SLOC of
that language in that data directory child.
The file "all-physical.sloc" has the final total SLOC for every language
in that child directory (i.e., it's the last line of the outfile).
<li>(Optional) If you want, you can also use USC's CodeCount.
I've had trouble with these programs, so I don't do this normally.
However, you're welcome to try - they support logical SLOC measures
as well as physical ones (though not for most of the languages
supported by SLOCCount).
Sadly, they don't seem to compile in gcc without a lot of help, they
used fixed-width buffers that make me nervous, and I found a
number of bugs (e.g., it couldn't handle "/* text1 *//* text2 */" in
C code, a format that's legal and used often in the Linux kernel).
If you want to do this,
modify the files compute_c_usc and compute_java_usc so they point to the
right directories, and type:
<pre>
 compute_c_usc *
</pre>
<li>Now you can analyze the results. The main tool for
presenting SLOCCount results is "get_sloc", e.g,:
<pre>
  get_sloc * | less
</pre>
The get_sloc program takes many options, including:
<pre>
 --filecount    Display number of files instead of SLOC (SLOC is the default)
 --wide         Use "wide" format instead (tab-separated columns)
 --nobreak      Don't insert breaks in long lines
 --sort  X      Sort by "X", where "X" is the name of a language
                ("ansic", "cpp", "fortran", etc.), or "total".
                By default, get_sloc sorts by "total".
 --nosort       Don't sort - just present results in order of directory
                listing given.
 --showother    Show non-language totals (e.g., # duplicate files).
 --oneprogram   When computing effort, assume that all files are part of
                a single program.  By default, each subdirectory specified
                is assumed to be a separate, independently-developed program.
 --noheader     Don't show the header
 --nofooter     Don't show the footer (the per-language values and totals)
</pre>
<p>
Note that unlike the "sloccount" tool, get_sloc requires the current
directory to be the data directory.
<p>
If you're displaying SLOC, get_sloc will also estimate the time it
would take to develop the software using COCOMO (using its "basic" model).
By default, this figure assumes that each of the major subdirectories was
developed independently of the others;
you can use "--oneprogram" to make the assumption that all files are
part of the same program.
The COCOMO model makes many other assumptions; see the paper at
<a href="http://www.dwheeler.com/sloc">http://www.dwheeler.com/sloc</a>
for more information.
<p>
If you need to do more analysis, you might want to use the "--wide"
option and send the data to another tool such as a spreadsheet
(e.g., gnumeric) or RDBMS (e.g., PostgreSQL).
Using the "--wide" option creates tab-separated data, which is easier to
import.
You may also want to use the "--noheader" and/or "--nofooter" options to
simplify porting the data to another tool.
<p>
Note that in version 1.0, "get_sloc" was called "get_data".
<p>
If you have so many data directory children that you can't use "*"
on the command line, get_sloc won't be as helpful.
Feel free to patch get_sloc to add this capability (as another option),
or use get_sloc_detail (discussed next) to feed the data into another tool.
<li>(Optional) If you just can't get the information you need from get_sloc,
then you can get the raw results of everything and process the data
yourself.
I have a little tool to do this, called get_sloc_details.
You invoke it in a similar manner:
<pre>
get_sloc_details *
</pre>
</ol>

<p>
<h1><a name="designer-notes">Designer's Notes</a></h1>
<p>
Here are some ``designer's notes'' on how SLOCCount works,
including what it can handle.
<p>
The program break_filelist
has categories for each programming language it knows about,
plus the special categories ``not'' (not a source code file),
``auto'' (an automatically-generated file and thus not to be counted),
``zero'' (a zero-length file),
``dup'' (a duplicate of another file as determined by an md5 checksum),
and
``unknown'' (a file which doesn't seem to be a source code file
nor any of these other categories).
It's a good idea to examine
the ``unknown'' items later, checking the common extensions
to ensure you have not missed any common types of code.
<p>
The program break_filelist uses lots of heuristics to correctly
categorize files.
Here are few notes about its heuristics:
<ol>
<li>
break_filelist first checks for well-known extensions (such as .gif) that
cannot be program files, and for a number of common generated filenames.
<li>
It then peeks at the first few lines for "#!" followed by a legal script
name.
Sometimes it looks further, for example, many Python programs
invoke "env" and then use it to invoke python.
<li>
If that doesn't work, it uses the extension to try to determine the category.
For a number of languages, the extension is not reliable, so for those
languages it examines the file contents and uses a set of heuristics
to determine if the file actually belongs to that category.
<li>
Detecting automatically generated files is not easy, and it's
quite conceivable that it won't detect some automatically generated files.
The first 15 lines are examined, to determine if any of them
include at the beginning of the line (after spaces and
possible comment markers) one of the following phrases (ignoring
upper and lower case distinctions):
``generated automatically'',
``automatically generated'',
``this is a generated file'',
``generated with the (something) utility'',
or ``do not edit''.
<li>A number of filename conventions are used, too.
For example,
any ``configure'' file is presumed to be automatically generated if
there's a ``configure.in'' file in the same directory.
<li>
To eliminate duplicates,
the program keeps md5 checksums of each program file.
Any given md5 checksum is only counted once.
Build directories are processed alphabetically, so
if the same file content is in both directories ``a'' and ``b'',
it will be counted only once as being part of ``a'' unless you make
other arrangements.
Thus, some data directory children with names later in the alphabet may appear
smaller than would make sense at first glance.
It is very difficult to eliminate ``almost identical'' files
(e.g., an older and newer version of the same code, included in two
separate packages), because
it is difficult to determine when two ``similar'' files are essentially
the same file.
Changes such as the use of pretty-printers and massive renaming of variables
could make small changes seem large, while the small files
might easily appear to be the ``same''.
Thus, files with different contents are simply considered different.
<li>
If all else fails, the file is placed in the ``unknown'' category for
later analysis.
</ol>
<p>
One complicating factor is that I wished to separate C, C++, and
Objective-C code, but a header file ending with
``.h'' or ``.hpp'' file could be any of these languages.
In theory, ``.hpp'' is only C++, but I found that in practice this isn't true.
I developed a number of heuristics to determine, for each file,
what language a given header belonged to.
For example, if a given directory has exactly one of these languages
(ignoring header files),
the header is assumed to belong to that category as well.
Similarly, if there is a body file (e.g., ".c") that has the same name
as the header file, then presumably the header file is of the same language.
Finally, a header file with the keyword ``class'' is almost certainly not a
C header file, but a C++ header file; otherwise it's assumed to
be a C file.
<p>
None of the SLOC counters fully parse the source code; they just examine
the code using simple text processing patterns to count the SLOC.
In practice, by handling a number of special cases this seems to be fine.
Here are some notes on some of the language counters;
the language name is followed by common extensions in parentheses
and the SLOCCount name of the language in brackets:
<ol>
<li>Ada (.ada, .ads, .adb) [ada]: Comments begin with "--".
<li>Assembly (.s, .S, .asm) [asm]:
Assembly languages vary greatly in the comment character they use,
so my counter had to handle this variance.
The assembly language counter (asm_count)
first examines the file to determine if
C-style ``/*'' comments and C preprocessor commands
(e.g., ``#include'') are used.
If both ``/*'' and ``*/'' are in the file, it's assumed that
C-style comments are being used
(since it is unlikely that <i>both</i> would be used
as something else, say as string data, in the same assembly language file).
Determining if a file used the C preprocessor was trickier, since
many assembly files do use ``#'' as a comment character and some
preprocessor directives are ordinary words that might be included
in a human comment.
The heuristic used is as follows: if #ifdef, #endif, or #include are used, the
C preprocessor is used; or if at least three lines have either #define or #else,
then the C preprocessor is used.
No doubt other heuristics are possible, but this at least seems to produce
reasonable results.
The program then determines what the comment character is by identifying
which punctuation mark (from a set of possible marks)
is the most common non-space initial character on a line
(ignoring ``/'' and ``#'' if C comments or preprocessor commands,
respectively, are used).
Once the comment character has been determined, and it's been determined
if C-style comments are allowed, the lines of code
are counted in the file.
<li>awk (.awk) [awk]: Comments begin with "#".
<li>C (.c) [ansic]: Both traditional C comments (/* .. */) and C++
(//) style comments are supported.
Although the older ANSI and ISO C standards didn't support // style
comments, in practice many C programs have used them for some time, and
the C99 standard includes them.
The C counter understands multi-line strings, so
comment characters (/* .. */ and //) are treated as data inside strings.
Conversely, the counter knows that any double-quote characters inside a
comment does not begin a C/C++ string.
<li>C++  (.C, .cpp, .cxx, .cc) [cpp]: The same counter is used for
both C and C++.
Note that break_filelist does try to separate C from C++ for purposes
of accounting between them.
<li>C# (.cs): The same counter is used as for C and C++.
Note that there are no "header" filetypes in C#.
<li>C shell (.csh) [csh]: Comments begin with "#".
<li>COBOL (.cob, .cbl) [cobol]: SLOCCount
detects if a "freeform" command has been given; until such a command is
given, fixed format is assumed.
In fixed format, comments have a "*" or "/" in column 7 or column 1;
any line that's not a comment, and has a nonwhitespace character after column 7
(the indicator area) is counted as a source line of code.
In a freeform style, any line beginning with optional whitespace and
then "*" or "/" is considered a comment; any noncomment line
with a nonwhitespace characeter is counted as SLOC.
<li>Expect (.exp) [exp]: Comments begin with "#".
<li>Fortran 77 (.f, .f77, .F, .F77) [fortran]: Comment-only lines are lines
where column 1 character = C, c, *, or !, or
where ! is preceded only by white space.
<li>Fortran 90 (.f90, .F90) [f90]: Comment-only lines are lines
where ! is preceded only by white space.
<li>Haskell (.hs) [haskell]:
This counter handles block comments {- .. -} and single line comments (--);
pragmas {-# .. -} are counted as SLOC.
This is a simplistic counter,
and can be fooled by certain unlikely combinations of block comments
and other syntax (line-ending comments or strings).
In particular,  "Hello {-"  will be incorrectly interpreted as a
comment block begin, and "{- -- -}" will be incorrectly interpreted as a
comment block begin without an end.  Literate files are detected by
their extension, and the style (TeX or plain text) is determined by
searching for a \begin{code} or "&gt;" at the beginning of lines.
See the <a
    href="http://www.haskell.org/onlinereport/literate.html">Haskell 98
    report section on literate Haskell</a> for more information.
<li>Java (.java) [java]: Java is counted using the same counter as C and C++.
<li>lex (.l) [lex]: Uses traditional C /* .. */ comments.
Note that this does not use the counter as C/C++ internally, since
it's quite legal in lex to have "//" (where it is NOT a comment).
<li>LISP (.cl, .el, .scm, .lsp, .jl) [lisp]: Comments begin with ";".
<li>ML (.ml, .mli, .mll, mly) [ml]: Comments nest and are enclosed in (* .. *).
<li>Modula3 (.m3, .mg, .i3, .ig) [modula3]: Comments are enclosed in (* .. *).
<li>Objective-C (.m) [objc]: Comments are old C-style /* .. */ comments.
<li>Pascal (.p, .pas) [pascal]: Comments are enclosed in curly braces {}
or (*..*).  This counter has known weaknesses; see the BUGS section of
the manual page for more information.
<li>Perl (.pl, .pm, .perl) [perl]:
Comments begin with "#".
Perl permits in-line ``perlpod'' documents, ``here'' documents, and an
__END__ marker that complicate code-counting.
Perlpod documents are essentially comments, but a ``here'' document
may include text to generate them (in which case the perlpod document
is data and should be counted).
The __END__ marker indicates the end of the file from Perl's
viewpoint, even if there's more text afterwards.
<li>PHP (.php, .php[3456], .inc) [php]:
Code is counted as PHP code if it has a .php file extension;
it's also counted if it has an .inc extension and looks like PHP code.
SLOCCount does <b>not</b> count PHP code embedded in HTML files normally,
though its lower-level routines can do so if you want to
(use php_count to do this).
Any of the various ways to begin PHP code can be used
(&lt;? .. ?&gt;,
&lt;?php .. ?&gt;,
&lt;script language="php"&gt; .. &lt;/script&gt;,
or even &lt;% .. %&gt;).
Any of the PHP comment formats (C, C++, and shell) can be used, and
any string constant formats ("here document", double quote, and single
quote) can be used as well.
<li>Python (.py) [python]:
Comments begin with "#".
Python has a convention that, at the beginning of a definition
(e.g., of a function, method, or class), an unassigned string can be
placed to describe what's being defined. Since this is essentially
a comment (though it doesn't syntactically look like one), the counter
avoids counting such strings, which may have multiple lines.
To handle this,
strings which started the beginning of a line were not counted.
Python also has the ``triple quote'' operator, permitting multiline
strings; these needed to be handled specially.
Triple quote stirngs are normally considered as data, regardless of
content, unless they were used as a comment about a definition.
<li>Ruby (.rb) [ruby]: Comments begin with "#".
<li>sed (.sed) [sed]: Comments begin with "#".
Note that these are "sed-only" files; many uses of sed are embeded in
shell scripts (and are categorized as shell scripts in those cases).
<li>shell (.sh) [sh]: Comments begin with "#".
Note that I classify ksh, bash, and the original Bourne shell sh together,
because they have very similar syntaxes.
For example, in all of these shells,
setting a variable is expressed as "varname=value",
while C shells use the use "set varname=value".
<li>TCL (.tcl, .tk, .itk) [tcl]: Comments begin with "#".
<li>Yacc (.y) [yacc]: Yacc is counted using the same counter as C and C++.
</ol>
<p>
Much of the code is written in Perl, since it's primarily a text processing
problem and Perl is good at that.
Many short scripts are Bourne shell scripts (it's good at
short scripts for calling other programs), and the
basic C/C++ SLOC counter is written in C for speed.
<p>
I originally named it "SLOC-Count", but I found that some web search
engines (notably Google) treated that as two words.
By naming it "SLOCCount", it's easier to find by those who know
the name of the program.
<p>
SLOCCount only counts physical SLOC, not logical SLOC.
Logical SLOC counting requires much more code to implement,
and I needed to cover a large number of programming languages.


<p>
<h1><a name="sloc-definition">Definition of SLOC</a></h1>
<p>
This tool measures ``physical SLOC.''
Physical SLOC is defined as follows:
``a physical source line of code (SLOC) is a line ending
in a newline or end-of-file marker,
and which contains at least one non-whitespace non-comment character.''
Comment delimiters (characters other than newlines starting and ending
a comment) are considered comment characters.
Data lines only including whitespace
(e.g., lines with only tabs and spaces in multiline strings) are not included.
<p>
To make this concrete, here's an example of a simple C program
(it strips ANSI C comments out).
On the left side is the running SLOC total, where "-" indicates a line
that is not considered a physical "source line of code":
<pre>
 1    #include &lt;stdio.h&gt;
 -    
 -    /* peek at the next character in stdin, but don't get it */
 2    int peek() {
 3     int c = getchar();
 4     ungetc(c, stdin);
 5     return c;
 6    }
 -    
 7    main() {
 8     int c;
 9     int incomment = 0;  /* 1 = we are inside a comment */
 -    
10     while ( (c = getchar()) != EOF) {
11        if (!incomment) {
12          if ((c == '/') &amp;&amp; (peek() == '*')) {incomment=1;}
13        } else {
14          if ((c == '*') &amp;&amp; (peek() == '/')) {
15               c= getchar(); c=getchar(); incomment=0;
16          }
17        }
18        if ((c != EOF) &amp;&amp; !incomment) {putchar(c);}
19     }
20    }
</pre>
<p>
<a href="http://www.sei.cmu.edu/publications/documents/92.reports/92.tr.020.html">Robert E. Park et al.'s
<i>Software Size Measurement:
A Framework for Counting Source Statements</i></a>
(Technical Report CMU/SEI-92-TR-20)
presents a set of issues to be decided when trying to count code.
The paper's abstract states:
<blockquote><i>
This report presents guidelines for defining, recording, and reporting
two frequently used measures of software sizeŃ physical source lines
and logical source statements.
We propose a general framework for constructing size
definitions and use it to derive operational methods for
reducing misunderstandings in measurement results.
</i></blockquote>
<p>
Using Park's framework, here is how physical lines of code are counted:
<ol>
<li>Statement Type: I used a physical line-of-code as my basis.
I included executable statements, declarations
(e.g., data structure definitions), and compiler directives
(e.g., preprocessor commands such as #define).
I excluded all comments and blank lines.
<li>How Produced:
I included all programmed code, including any files that had been modified.
I excluded code generated with source code generators, converted with
automatic translators, and those copied or reused without change.
If a file was in the source package, I included it; if the file had
been removed from a source package (including via a patch), I did
not include it.
<li>Origin: You select the files (and thus their origin).
<li>Usage: You selects the files (and thus their usage), e.g.,
you decide if you're going to
include additional applications able to run on the system but not
included with the system.
<li>Delivery: You'll decide what code to include, but of course,
if you don't have the code you can't count it.
<li>Functionality: This tool will include both operative and inoperative code
if they're mixed together.
An example of intentionally ``inoperative'' code is
code turned off by #ifdef commands; since it could be
turned on for special purposes, it made sense to count it.
An example of unintentionally ``inoperative'' code is dead or unused code.
<li>Replications:
Normally, duplicate files are ignored, unless you use
the "--duplicates" or "--crossdups" option.
The tool will count
``physical replicates of master statements stored in
the master code''.
This is simply code cut and pasted from one place to another to reuse code;
it's hard to tell where this happens, and since it has to be maintained
separately, it's fair to include this in the measure.
I excluded copies inserted, instantiated, or expanded when compiling
or linking, and I excluded postproduction replicates
(e.g., reparameterized systems).
<li>Development Status: You'll decide what code
should be included (and thus the development status of the code that
you'll accept).
<li>Languages: You can see the language list above.
<li>Clarifications: I included all statement types.
This included nulls, continues, no-ops, lone semicolons,
statements that instantiate generics,
lone curly braces ({ and }), and labels by themselves.
</ol>
<p>
Thus, SLOCCount generally follows Park's ``basic definition'',
but with the following exceptions depending on how you use it:
<ol>
<li>How Produced:
By default, this tool excludes duplicate files and
code generated with source code generators.
After all, the COCOMO model states that the
only code that should be counted is code
``produced by project personnel'', whereas these kinds of files are
instead the output of ``preprocessors and compilers.''
If code is always maintained as the input to a code generator, and then
the code generator is re-run, it's only the code generator input's size that
validly measures the size of what is maintained.
Note that while I attempted to exclude generated code, this exclusion
is based on heuristics which may have missed some cases.
If you want to count duplicates, use the
"--autogen", "--duplicates", and/or "--crossdups" options.
If you want to count automatically generated files, pass
the "--autogen" option mentioned above.
<li>Origin:
You can choose what source code you'll measure.
Normally physical SLOC doesn't include an unmodified
``vendor-supplied language support library'' nor a
``vendor-supplied system or utility''.
However, if this is what you are measuring, then you need to include it.
If you include such code, your set will be different
than the usual ``basic definition.''
<li>Functionality: I included counts of unintentionally inoperative code
(e.g., dead or unused code).
It is very difficult to automatically detect such code
in general for many languages.
For example, a program not directly invoked by anything else nor
installed by the installer is much more likely to be a test program,
which you may want to include in the count (you often would include it
if you're estimating effort).
Clearly, discerning human ``intent'' is hard to automate.
</ol>
<p>
Otherwise, this counter follows Park's
``basic definition'' of a physical line of code, even down to Park's
language-specific definitions where Park defined them for a language.


<p>
<h1><a name="miscellaneous">Miscellaneous Notes</a></h1>
<p>
There are other undocumented analysis tools in the original tar file.
Most of them are specialized scripts for my circumstances, but feel
free to use them as you wish.
<p>
If you're packaging this program, don't just copy every executable
into the system "bin" directory - many of the files are those
specialized scripts.
Just put in the bin directory every executable documented here, plus the
the files they depend on (there aren't that many).
See the RPM specification file to see what's actually installed.
<p>
You have to take any measure of SLOC (including this one) with a
large grain of salt.
Physical SLOC is sensitive to the format of source code.
There's a correlation between SLOC and development effort, and some
correlation between SLOC and functionality,
but there's absolutely no correlation between SLOC
and either "quality" or "value".
<p>
A problem of physical SLOC is that it's sensitive to formatting,
and that's a legitimate (and known) problem with the measure.
However, to be fair, logical SLOC is influenced by coding style too.
For example, the following two phrases are semantically identical,
but will have different logical SLOC values:
<pre>
   int i, j;  /* 1 logical SLOC */

   int i;     /* 2 logical SLOC, but it does the same thing */
   int j;
</pre>
<p>
If you discover other information that can be divided up by
data directory children (e.g., the license used), it's probably best
to add that to each subdirectory (e.g., as a "license" file in the
subdirectory).
Then you can modify tools like get_sloc
to add them to their display.
<p>
I developed SLOCCount for my own use, not originally as
a community tool, so it's certainly not beautiful code.
However, I think it's serviceable - I hope you find it useful.
Please send me patches for any improvements you make!
<p>
You can't use this tool as-is with some estimation models, such as COCOMO II,
because this tool doesn't compute logical SLOC.
I certainly would accept code contributions to add the ability to
measure logical SLOC (or related measures such as
Cyclomatic Complexity and Cyclomatic density);
selecting them could be a compile-time option.
However, measuring logical SLOC takes more development effort, so I
haven't done so; see USC's "CodeCount" for a set of code that
measures logical SLOC for some languages
(though I've had trouble with CodeCount - in particular, its C counter
doesn't correctly handle large programs like the Linux kernel).


<p>
<h1><a name="license">SLOCCount License</a></h1>
<p>
Here is the SLOCCount License; the file COPYING contains the standard
GPL version 2 license:
<pre>
=====================================================================
SLOCCount
Copyright (C) 2000-2001 David A. Wheeler (dwheeler, at, dwheeler.com)

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
=====================================================================
</pre>
<p>
While it's not formally required by the license, please give credit
to me and this software in any report that uses results generated by it.
<p>
This document was written by David A. Wheeler (dwheeler, at, dwheeler.com),
and is
(C) Copyright 2001 David A. Wheeler.
This document is covered by the license (GPL) listed above.
<p>
The license <i>does</i> give you the right to
use SLOCCount to analyze proprietary programs.

<p>
<h1><a name="related-tools">Related Tools</a></h1>
<p>
One available toolset is
<a href="http://sunset.usc.edu/research/CODECOUNT">CodeCount</a>.
I tried using this toolset, but I eventually gave up.
It had too many problems handling the code I was trying to analyze, and it
does a poor job automatically categorizing code.
It also has no support for many of today's languages (such as Python,
Perl, Ruby, PHP, and so on).
However, it does a lot of analysis and measurements that SLOCCount
doesn't do, so it all depends on your need.
Its license appeared to be open source, but it's quite unusual and
I'm not enough of a lawyer to be able to confirm that.
<p>
Another tool that's available is <a href="http://csdl.ics.hawaii.edu/Research/LOCC/LOCC.html">LOCC</a>.
It's available under the GPL.
It can count Java code, and there's experimental support for C++.
LOCC is really intended for more deeply analyzing each Java file;
what's particularly interesting about it is that it can measure
"diffs" (how much has changed).
See
<a href="http://csdl.ics.hawaii.edu/Publications/MasterList.html#csdl2-00-10">
A comparative review of LOCC and CodeCount</a>.
<p>
<a href="http://sourceforge.net/projects/cccc">
CCCC</a> is a tool which analyzes C++ and Java files
and generates a report on various metrics of the code.
Metrics supported include lines of code, McCabe's complexity,
and metrics proposed by Chidamber &amp; Kemerer and Henry &amp; Kafura.
(You can see
<a href="http://cccc.sourceforge.net/">Time Littlefair's comments</a>).
CCCC is in the public domain.
It reports on metrics that sloccount doesn't, but sloccount can handle
far more computer languages.

<p>
<h1><a name="submitting-changes">Submitting Changes</a></h1>
<p>
The GPL license doesn't require you to submit changes you make back to
its maintainer (currently me),
but it's highly recommended and wise to do so.
Because others <i>will</i> send changes to me, a version you make on your
own will slowly because obsolete and incompatible.
Rather than allowing this to happen, it's better to send changes in to me
so that the latest version of SLOCCount also has the
features you're looking for.
If you're submitting support for new languages, be sure that your
chnage correctly ignores files that aren't in that new language
(some filename extensions have multiple meanings).
You might want to look at the <a href="TODO">TODO</a> file first.
<p>
When you send changes to me, send them as "diff" results so that I can
use the "patch" program to install them.
If you can, please send ``unified diffs'' -- GNU's diff can create these
using the "-u" option.
</body>