update documenation to changed license & modularized implemenation

* also replace separate maintained man page via auto-generated one from pod * documentation of the "ninka" script is maintained as pod directly inside the script itself (bin/ninka) * at build time via "make" a man page is generated under blib/man1/ninka.1p
author: René Scheibe <rene.scheibe@gmail.com> 2015-06-02 00:03:09 +0200
committer: René Scheibe <rene.scheibe@gmail.com> 2015-06-04 17:44:53 +0200
commit: 9f3023e62659702d85a1fccbc5d49a4bb8392ba1 (patch)
tree: d7fa3f232308c52c16f3e7437e91983935c80b6d
parent: 4419dba89e92b471d1b869de8ece10ddb567be4f (diff)
download: ninka-9f3023e62659702d85a1fccbc5d49a4bb8392ba1.tar.gz
6 files changed, 109 insertions, 172 deletions
diff --git a/Changes b/Changes
index ad02822..04ec33e 100644
--- a/Changes
+++ b/Changes
@@ -22,7 +22,7 @@
 
 	* ninka.pl: fixed bug in finding the path of where ninka was being executed from (reported by Ryan Biesemeyer)
 
-	* Fixed quotes in perl (René bScheibe)
+	* Fixed quotes in perl (René Scheibe)
 
 2015-01-05  dmg  <dmg@uvic.ca>
 
diff --git a/Makefile.PL b/Makefile.PL
index 95f568b..b29cf02 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -7,7 +7,7 @@ WriteMakefile(
     NAME => 'Ninka',
     VERSION_FROM => 'lib/Ninka.pm',
     ABSTRACT_FROM => 'lib/Ninka.pm',
-    LICENSE => 'agpl_3',
+    LICENSE => 'gpl_2',
     AUTHOR => [
         'Daniel M. German <dmg@uvic.ca>',
         'Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>',
@@ -38,7 +38,7 @@ WriteMakefile(
         resources => {
             homepage => 'http://ninka.turingmachine.org/',
             repository => 'https://github.com/dmgerman/ninka',
-            license => 'http://www.gnu.org/licenses/agpl-3.0.html',
+            license => 'http://www.gnu.org/licenses/gpl-2.0.html',
        },
     },
 );
diff --git a/README b/README
index bd67b2c..dbbe6f1 100644
--- a/README
+++ b/README
@@ -11,16 +11,13 @@ under which a source file is made available.
 This tool uses a source file as input and outputs the licenses
 identified within that file.
 
-If you need to know the detail of Ninka, please see the following
-paper:
+If you need to know the detail of Ninka, please see the following paper:
 
 Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching
 method for automatic license identification of source code files. In
 25nd IEEE/ACM International Conference on Automated Software
 Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or
-download it from
-
-http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
+download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
 
 If you use Ninka for research purposes, we would appreciate you cite
 the above paper.
@@ -28,13 +25,13 @@ the above paper.
 * Contributors
 
 - Paul Clough for his code to split sentences
-- Anthony Kohan for writing the excel and sqlite backends.
-- Armijn Hemel from Tjaldur Software Governance Solutions  for multiple bug reports and suggestions
+- Anthony Kohan for writing the excel and sqlite backends
+- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
+- René Scheibe for modularizing the code
 
 * License
 
-  Except for the directories comments and splitter, Ninka is licensed
-  under the GPLv2+
+  Ninka is licensed under the GPLv2+:
 
     Copyright (C) 2009-2014  Yuki Manabe and Daniel M. German
 
@@ -51,13 +48,10 @@ the above paper.
     You should have received a copy of the GNU General Public License
     along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
-  - splitter.pl is a derivative work of the Rule-based sentence
-    splitter script by Paul Paul Clough. Please see splitter/README
-    for details.
+  Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence
+  splitter script by Paul Paul Clough.
 
-  - comments is based on a program to remove comments by Jon Newman,
-    it is released under the GNU General Public License Version 2 or
-    (at your option) any later version.
+  comments is based on a program to remove comments by Jon Newman.
 
 * Requirements
 
@@ -70,40 +64,25 @@ the above paper.
 * How to install
 
   1. Unpack the distribution in a directory.
-  2. Optional: Build and install comments (make sure it is somwehere in the
-     path) (see directory comments)
-
+  2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments)
 
-* Usage:
+* Usage
 
-Ninka uses a pipe model (see below). Each step of the "pipe" creates a
-file, but
+ninka [options] filename
 
-ninka.pl [options] [filename]
+Available options:
 
-Available options
+  -i create intermediary files
   -v verbose
-  -d delete intermediate files
-  -C force creation of comments file
-  -c stop after creation of comments
-  -S force creation of sentences file
-  -s stop after creation of sentences
-  -G force creation of goodsent file
-  -g stop after creation of goodsent
-  -T force creation of senttok file
-  -t stop after creation of senttok
-  -L force creation of license file
-  -f force all processing
-
 
 Example:
 
-   ninka.pl foo.c
+  ninka -i foo.c
 
 It will create five files:
 
-  1. foo.c.comments: extracted the first two comments blocks, where
-     the license is usually
+  1. foo.c.comments: extracted the first comments blocks, where
+     the license is usually included
   2. foo.c.sentences: creates the list of sentences in the license
      statement
   3. foo.c.goodsent: contains sentences that are likely to be part of
@@ -117,69 +96,60 @@ It will create five files:
      - Licenses
      - Unmatched sentences in *.senttok that were not matched
 
-
-
+The files are not required for Ninka's functionality. But they can help
+to debug license detection issues.
 
 * Ninka model
 
 Ninka uses a pipe-model. Each stage of the pipe does something very specific:
 
- 1. Comment extractor.
+1. Comment extractor
 
-    - directory: extComments
+    - Module: Ninka::CommentExtractor
 
-    - command: extComments.pl, might use comments (included in distribution)
+    - Purpose: Extracts top comments of source code.
+               If no comment extractor is known for the language,
+               then extracts top lines from source (currently 700)
 
-    - Purpose: Extracts top comments of source code. If no
-          comment extractor is known for the language, then extracts top lines from source (currently 700)
-
-    - Creates <filename>.comments file
+    - Output: <filename>.comments
 
 2. Split sentences in comments
 
-     - directory: splitter
-
-     - command: splitter.pl
-
-     - Purpose: Ninka works by matching sentences of licenses, hence
-       it needs to properly break text into sentences.
-
-     - Outputs <filename>.sentences
-
-3. Filter "good" sentences.
+     - Module: Ninka::SentenceExtractor
 
-     - directory filter
+     - Purpose: Ninka works by matching sentences of licenses,
+                hence it needs to properly break text into sentences.
 
-     - command: filter.pl
+     - Output: <filename>.sentences
 
-     - Purpose: some sentences are related to a license, some are
-       not. It is valuable to know if a file contains lines that look
-       like a license or not (e.g. to know that a file has no license)
+3. Filter "good" sentences
 
-     - Outputs: <filename>.goodsent, and <filename>.badsent (not used)
+     - Module: Ninka::SentenceFilter
 
-4. Tokenizes sentences
+     - Purpose: Some sentences are related to a license, some are not.
+                It is valuable to know if a file contains lines that look like
+                a license or not (e.g. to know that a file has no license).
 
-     - Directory senttok
+     - Output: <filename>.goodsent and <filename>.badsent
 
-     - command: senttok.pl
+4. Tokenize sentences
 
-     - Purpose: It creates a file that corresponds to the recognized
-       sentence tokens. For each sentence, it outputs its sentence token, or unknown otherwise.
+     - Module: Ninka::SentenceTokenizer
 
-     - Outputs: <filename>.senttok
+     - Purpose: It creates a file that corresponds to the recognized sentence tokens.
+                For each sentence, it outputs its sentence token, or unknown otherwise.
 
-5. Matches sentences to licenses
+     - Output: <filename>.senttok
 
-     - Directory matcher
+5. Match sentences to licenses
 
-     - Command: matcher.pl
+     - Module: Ninka::LicenseMatcher
 
-     - Purpose: looks at the sequence of sentence tokens and outputs the licenses found
+     - Purpose: It looks at the sentence tokens and outputs the licenses found.
 
      - Output: <filename>.license
 
-The script ninka.pl takes care of all these steps, and optionally removes
+The script ninka takes care of all these steps, and optionally creates
 intermediary files, and writes to the stdout the licenses found.
 
 ------
diff --git a/bin/ninka b/bin/ninka
index 4732cbe..9cfd6aa 100755
--- a/bin/ninka
+++ b/bin/ninka
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/perl
 
 use strict;
 use warnings;
@@ -19,7 +19,7 @@ sub parse_cmdline_parameters {
     if (!getopts('iv', \%opts) || scalar(@ARGV) == 0) {
         print STDERR "Ninka v${Ninka::VERSION}
 
-Usage: $0 [options] <filename>
+Usage: ninka [options] <filename>
 
 Options:
   -i create intermediary files
@@ -32,29 +32,79 @@ Options:
 
 __END__
 
+=encoding utf8
+
 =head1 NAME
 
-ninka
+ninka - source file license identification tool
+
+=head1 SYNOPSYS
+
+B<ninka> [options] F<filename>
 
 =head1 DESCRIPTION
 
-Scans a file and returns the found licenses.
+Scans a source file and returns the found licenses.
+
+=head1 OPTIONS
+
+=over
+
+=item B<-i>
+
+create intermediary files (for debugging)
+
+=item B<-v>
+
+verbose
+
+=back
+
+=head1 EXAMPLES
+
+=over
+
+=item B<ninka> F<foo.c>
+
+Determine the licenses in file F<foo.c>.
+
+=item B<ninka -i> F<foo.c>
+
+Determine the licenses in file F<foo.c> and create intermediary files (for debugging).
+
+=item find * | xargs -n1 -I@ B<ninka> '@'
+
+Determine the licenses of files in a directory.
+
+=back
+
+=head1 AUTHOR
+
+B<ninka> was written by Daniel M. German <dmg@uvic.ca> and Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>.
+
+=head1 SEE ALSO
+
+Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method
+for automatic license identification of source code files. In 25nd IEEE/ACM
+International Conference on Automated Software Engineering (ASE 2010).
+
+You can download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
 
 =head1 COPYRIGHT AND LICENSE
 
-Copyright (C) 2009-2014  Yuki Manabe and Daniel M. German
+Copyright (C) 2009-2014  Yuki Manabe and Daniel M. German, 2015 René Scheibe
 
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU Affero General Public License as
-published by the Free Software Foundation, either version 3 of the
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as
+published by the Free Software Foundation; either version 2 of the
 License, or (at your option) any later version.
 
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Affero General Public License for more details.
+GNU General Public License for more details.
 
-You should have received a copy of the GNU Affero General Public License
+You should have received a copy of the GNU General Public License
 along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
 =cut
diff --git a/lib/Ninka.pm b/lib/Ninka.pm
index 526aeab..8f454cd 100644
--- a/lib/Ninka.pm
+++ b/lib/Ninka.pm
@@ -68,7 +68,7 @@ __END__
 
 =head1 NAME
 
-Ninka - Find licenses in source files.
+Ninka - source file license identification tool
 
 =head1 SYNOPSIS
 
@@ -82,7 +82,7 @@ Ninka - Find licenses in source files.
 
 =head1 DESCRIPTION
 
-Scans a file and returns the found licenses.
+Scans a source file and returns the found licenses.
 
 =head1 COPYRIGHT AND LICENSE
 
diff --git a/man/ninka.1 b/man/ninka.1
deleted file mode 100644
index 9cd2d57..0000000
--- a/man/ninka.1
+++ /dev/null
@@ -1,83 +0,0 @@
-.TH NINKA 1.3 "May 2015" ninka
-.SH NAME
-ninka \- source file license identification tool
-.SH SYNOPSYS
-.SY ninka
-.OP \-vfCcSsGgTtLd
-.OP \-\-
-.RI [ file ]
-.YS 
-
-.SH DESCRIPTION
-
-Analyses source files to determine the license they fall under. Takes a source
-file as input and outputs the file's license.
-
-.SH OPTIONS
-
-.IP \-v
-verbose
-
-.IP \-f
-force all processing
-
-.IP \-C
-force creation of comments
-.IP \-c
-stop after creation of comments
-
-.IP \-S
-force creation of sentences
-.IP \-s
-stop after creation of sentences
-
-.IP \-G
-force creation of goodsent
-.IP \-g
-stop after creation of goodsent
-
-.IP \-T
-force creation of senttok
-.IP \-t
-stop after creation of senttok
-
-.IP \-L
-force creation of matching
-
-.IP \-d
-delete intermediate files
-
-.IP \-\-
-Stop processing options
-
-.SH EXAMPLES
-
-.TP
-\fBninka\fR \fIfoo.c\fR
-Determine the licenses in file foo.c
-
-.TP
-.BI ninka\ \-d \ foo.c
-Determine the license in file foo.c and delete intermediary files
-
-.TP
-find * | xargs \-n1 \-I@ \fBninka\fR '@'
-Determine the licenses of files in a directory.
-
-
-.SH AUTHOR
-
-\fBninka\fR was written by Daniel M. German <dmg@uvic.ca> and Yuki Manabe
-<y-manabe@ist.osaka-u.ac.jp>. ninka itself is licensed under the AGPLv3+. This
-manpage was written by Ryan Kavanagh <ryanakca@kubuntu.org> for the Debian
-project and is also licensed under the AGPLv3+.
-
-.SH SEE ALSO
-
-Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method
-for automatic license identification of source code files. In 25nd IEEE/ACM
-International Conference on Automated Software Engineering (ASE 2010).
-
-You can email Daniel M. German <dmg@uvic.ca> for a copy or download it from
-.UR http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
-.UE
author	René Scheibe <rene.scheibe@gmail.com>	2015-06-02 00:03:09 +0200
committer	René Scheibe <rene.scheibe@gmail.com>	2015-06-04 17:44:53 +0200
commit	9f3023e62659702d85a1fccbc5d49a4bb8392ba1 (patch)
tree	d7fa3f232308c52c16f3e7437e91983935c80b6d
parent	4419dba89e92b471d1b869de8ece10ddb567be4f (diff)
download	ninka-9f3023e62659702d85a1fccbc5d49a4bb8392ba1.tar.gz