summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRené Scheibe <rene.scheibe@gmail.com>2015-06-02 00:03:09 +0200
committerRené Scheibe <rene.scheibe@gmail.com>2015-06-04 17:44:53 +0200
commit9f3023e62659702d85a1fccbc5d49a4bb8392ba1 (patch)
treed7fa3f232308c52c16f3e7437e91983935c80b6d
parent4419dba89e92b471d1b869de8ece10ddb567be4f (diff)
downloadninka-9f3023e62659702d85a1fccbc5d49a4bb8392ba1.tar.gz
update documenation to changed license & modularized implemenation
* also replace separate maintained man page via auto-generated one from pod * documentation of the "ninka" script is maintained as pod directly inside the script itself (bin/ninka) * at build time via "make" a man page is generated under blib/man1/ninka.1p
-rw-r--r--Changes2
-rw-r--r--Makefile.PL4
-rw-r--r--README118
-rwxr-xr-xbin/ninka70
-rw-r--r--lib/Ninka.pm4
-rw-r--r--man/ninka.183
6 files changed, 109 insertions, 172 deletions
diff --git a/Changes b/Changes
index ad02822..04ec33e 100644
--- a/Changes
+++ b/Changes
@@ -22,7 +22,7 @@
* ninka.pl: fixed bug in finding the path of where ninka was being executed from (reported by Ryan Biesemeyer)
- * Fixed quotes in perl (René bScheibe)
+ * Fixed quotes in perl (René Scheibe)
2015-01-05 dmg <dmg@uvic.ca>
diff --git a/Makefile.PL b/Makefile.PL
index 95f568b..b29cf02 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -7,7 +7,7 @@ WriteMakefile(
NAME => 'Ninka',
VERSION_FROM => 'lib/Ninka.pm',
ABSTRACT_FROM => 'lib/Ninka.pm',
- LICENSE => 'agpl_3',
+ LICENSE => 'gpl_2',
AUTHOR => [
'Daniel M. German <dmg@uvic.ca>',
'Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>',
@@ -38,7 +38,7 @@ WriteMakefile(
resources => {
homepage => 'http://ninka.turingmachine.org/',
repository => 'https://github.com/dmgerman/ninka',
- license => 'http://www.gnu.org/licenses/agpl-3.0.html',
+ license => 'http://www.gnu.org/licenses/gpl-2.0.html',
},
},
);
diff --git a/README b/README
index bd67b2c..dbbe6f1 100644
--- a/README
+++ b/README
@@ -11,16 +11,13 @@ under which a source file is made available.
This tool uses a source file as input and outputs the licenses
identified within that file.
-If you need to know the detail of Ninka, please see the following
-paper:
+If you need to know the detail of Ninka, please see the following paper:
Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching
method for automatic license identification of source code files. In
25nd IEEE/ACM International Conference on Automated Software
Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or
-download it from
-
-http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
+download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
If you use Ninka for research purposes, we would appreciate you cite
the above paper.
@@ -28,13 +25,13 @@ the above paper.
* Contributors
- Paul Clough for his code to split sentences
-- Anthony Kohan for writing the excel and sqlite backends.
-- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
+- Anthony Kohan for writing the excel and sqlite backends
+- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
+- René Scheibe for modularizing the code
* License
- Except for the directories comments and splitter, Ninka is licensed
- under the GPLv2+
+ Ninka is licensed under the GPLv2+:
Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German
@@ -51,13 +48,10 @@ the above paper.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
- - splitter.pl is a derivative work of the Rule-based sentence
- splitter script by Paul Paul Clough. Please see splitter/README
- for details.
+ Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence
+ splitter script by Paul Paul Clough.
- - comments is based on a program to remove comments by Jon Newman,
- it is released under the GNU General Public License Version 2 or
- (at your option) any later version.
+ comments is based on a program to remove comments by Jon Newman.
* Requirements
@@ -70,40 +64,25 @@ the above paper.
* How to install
1. Unpack the distribution in a directory.
- 2. Optional: Build and install comments (make sure it is somwehere in the
- path) (see directory comments)
-
+ 2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments)
-* Usage:
+* Usage
-Ninka uses a pipe model (see below). Each step of the "pipe" creates a
-file, but
+ninka [options] filename
-ninka.pl [options] [filename]
+Available options:
-Available options
+ -i create intermediary files
-v verbose
- -d delete intermediate files
- -C force creation of comments file
- -c stop after creation of comments
- -S force creation of sentences file
- -s stop after creation of sentences
- -G force creation of goodsent file
- -g stop after creation of goodsent
- -T force creation of senttok file
- -t stop after creation of senttok
- -L force creation of license file
- -f force all processing
-
Example:
- ninka.pl foo.c
+ ninka -i foo.c
It will create five files:
- 1. foo.c.comments: extracted the first two comments blocks, where
- the license is usually
+ 1. foo.c.comments: extracted the first comments blocks, where
+ the license is usually included
2. foo.c.sentences: creates the list of sentences in the license
statement
3. foo.c.goodsent: contains sentences that are likely to be part of
@@ -117,69 +96,60 @@ It will create five files:
- Licenses
- Unmatched sentences in *.senttok that were not matched
-
-
+The files are not required for Ninka's functionality. But they can help
+to debug license detection issues.
* Ninka model
Ninka uses a pipe-model. Each stage of the pipe does something very specific:
- 1. Comment extractor.
+1. Comment extractor
- - directory: extComments
+ - Module: Ninka::CommentExtractor
- - command: extComments.pl, might use comments (included in distribution)
+ - Purpose: Extracts top comments of source code.
+ If no comment extractor is known for the language,
+ then extracts top lines from source (currently 700)
- - Purpose: Extracts top comments of source code. If no
- comment extractor is known for the language, then extracts top lines from source (currently 700)
-
- - Creates <filename>.comments file
+ - Output: <filename>.comments
2. Split sentences in comments
- - directory: splitter
-
- - command: splitter.pl
-
- - Purpose: Ninka works by matching sentences of licenses, hence
- it needs to properly break text into sentences.
-
- - Outputs <filename>.sentences
-
-3. Filter "good" sentences.
+ - Module: Ninka::SentenceExtractor
- - directory filter
+ - Purpose: Ninka works by matching sentences of licenses,
+ hence it needs to properly break text into sentences.
- - command: filter.pl
+ - Output: <filename>.sentences
- - Purpose: some sentences are related to a license, some are
- not. It is valuable to know if a file contains lines that look
- like a license or not (e.g. to know that a file has no license)
+3. Filter "good" sentences
- - Outputs: <filename>.goodsent, and <filename>.badsent (not used)
+ - Module: Ninka::SentenceFilter
-4. Tokenizes sentences
+ - Purpose: Some sentences are related to a license, some are not.
+ It is valuable to know if a file contains lines that look like
+ a license or not (e.g. to know that a file has no license).
- - Directory senttok
+ - Output: <filename>.goodsent and <filename>.badsent
- - command: senttok.pl
+4. Tokenize sentences
- - Purpose: It creates a file that corresponds to the recognized
- sentence tokens. For each sentence, it outputs its sentence token, or unknown otherwise.
+ - Module: Ninka::SentenceTokenizer
- - Outputs: <filename>.senttok
+ - Purpose: It creates a file that corresponds to the recognized sentence tokens.
+ For each sentence, it outputs its sentence token, or unknown otherwise.
-5. Matches sentences to licenses
+ - Output: <filename>.senttok
- - Directory matcher
+5. Match sentences to licenses
- - Command: matcher.pl
+ - Module: Ninka::LicenseMatcher
- - Purpose: looks at the sequence of sentence tokens and outputs the licenses found
+ - Purpose: It looks at the sentence tokens and outputs the licenses found.
- Output: <filename>.license
-The script ninka.pl takes care of all these steps, and optionally removes
+The script ninka takes care of all these steps, and optionally creates
intermediary files, and writes to the stdout the licenses found.
------
diff --git a/bin/ninka b/bin/ninka
index 4732cbe..9cfd6aa 100755
--- a/bin/ninka
+++ b/bin/ninka
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/perl
use strict;
use warnings;
@@ -19,7 +19,7 @@ sub parse_cmdline_parameters {
if (!getopts('iv', \%opts) || scalar(@ARGV) == 0) {
print STDERR "Ninka v${Ninka::VERSION}
-Usage: $0 [options] <filename>
+Usage: ninka [options] <filename>
Options:
-i create intermediary files
@@ -32,29 +32,79 @@ Options:
__END__
+=encoding utf8
+
=head1 NAME
-ninka
+ninka - source file license identification tool
+
+=head1 SYNOPSYS
+
+B<ninka> [options] F<filename>
=head1 DESCRIPTION
-Scans a file and returns the found licenses.
+Scans a source file and returns the found licenses.
+
+=head1 OPTIONS
+
+=over
+
+=item B<-i>
+
+create intermediary files (for debugging)
+
+=item B<-v>
+
+verbose
+
+=back
+
+=head1 EXAMPLES
+
+=over
+
+=item B<ninka> F<foo.c>
+
+Determine the licenses in file F<foo.c>.
+
+=item B<ninka -i> F<foo.c>
+
+Determine the licenses in file F<foo.c> and create intermediary files (for debugging).
+
+=item find * | xargs -n1 -I@ B<ninka> '@'
+
+Determine the licenses of files in a directory.
+
+=back
+
+=head1 AUTHOR
+
+B<ninka> was written by Daniel M. German <dmg@uvic.ca> and Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>.
+
+=head1 SEE ALSO
+
+Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method
+for automatic license identification of source code files. In 25nd IEEE/ACM
+International Conference on Automated Software Engineering (ASE 2010).
+
+You can download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
=head1 COPYRIGHT AND LICENSE
-Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German
+Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German, 2015 René Scheibe
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU Affero General Public License as
-published by the Free Software Foundation, either version 3 of the
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as
+published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU Affero General Public License for more details.
+GNU General Public License for more details.
-You should have received a copy of the GNU Affero General Public License
+You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
=cut
diff --git a/lib/Ninka.pm b/lib/Ninka.pm
index 526aeab..8f454cd 100644
--- a/lib/Ninka.pm
+++ b/lib/Ninka.pm
@@ -68,7 +68,7 @@ __END__
=head1 NAME
-Ninka - Find licenses in source files.
+Ninka - source file license identification tool
=head1 SYNOPSIS
@@ -82,7 +82,7 @@ Ninka - Find licenses in source files.
=head1 DESCRIPTION
-Scans a file and returns the found licenses.
+Scans a source file and returns the found licenses.
=head1 COPYRIGHT AND LICENSE
diff --git a/man/ninka.1 b/man/ninka.1
deleted file mode 100644
index 9cd2d57..0000000
--- a/man/ninka.1
+++ /dev/null
@@ -1,83 +0,0 @@
-.TH NINKA 1.3 "May 2015" ninka
-.SH NAME
-ninka \- source file license identification tool
-.SH SYNOPSYS
-.SY ninka
-.OP \-vfCcSsGgTtLd
-.OP \-\-
-.RI [ file ]
-.YS
-
-.SH DESCRIPTION
-
-Analyses source files to determine the license they fall under. Takes a source
-file as input and outputs the file's license.
-
-.SH OPTIONS
-
-.IP \-v
-verbose
-
-.IP \-f
-force all processing
-
-.IP \-C
-force creation of comments
-.IP \-c
-stop after creation of comments
-
-.IP \-S
-force creation of sentences
-.IP \-s
-stop after creation of sentences
-
-.IP \-G
-force creation of goodsent
-.IP \-g
-stop after creation of goodsent
-
-.IP \-T
-force creation of senttok
-.IP \-t
-stop after creation of senttok
-
-.IP \-L
-force creation of matching
-
-.IP \-d
-delete intermediate files
-
-.IP \-\-
-Stop processing options
-
-.SH EXAMPLES
-
-.TP
-\fBninka\fR \fIfoo.c\fR
-Determine the licenses in file foo.c
-
-.TP
-.BI ninka\ \-d \ foo.c
-Determine the license in file foo.c and delete intermediary files
-
-.TP
-find * | xargs \-n1 \-I@ \fBninka\fR '@'
-Determine the licenses of files in a directory.
-
-
-.SH AUTHOR
-
-\fBninka\fR was written by Daniel M. German <dmg@uvic.ca> and Yuki Manabe
-<y-manabe@ist.osaka-u.ac.jp>. ninka itself is licensed under the AGPLv3+. This
-manpage was written by Ryan Kavanagh <ryanakca@kubuntu.org> for the Debian
-project and is also licensed under the AGPLv3+.
-
-.SH SEE ALSO
-
-Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method
-for automatic license identification of source code files. In 25nd IEEE/ACM
-International Conference on Automated Software Engineering (ASE 2010).
-
-You can email Daniel M. German <dmg@uvic.ca> for a copy or download it from
-.UR http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
-.UE