diff options
author | René Scheibe <rene.scheibe@gmail.com> | 2015-06-02 00:03:09 +0200 |
---|---|---|
committer | René Scheibe <rene.scheibe@gmail.com> | 2015-06-04 17:44:53 +0200 |
commit | 9f3023e62659702d85a1fccbc5d49a4bb8392ba1 (patch) | |
tree | d7fa3f232308c52c16f3e7437e91983935c80b6d | |
parent | 4419dba89e92b471d1b869de8ece10ddb567be4f (diff) | |
download | ninka-9f3023e62659702d85a1fccbc5d49a4bb8392ba1.tar.gz |
update documenation to changed license & modularized implemenation
* also replace separate maintained man page via auto-generated one from pod
* documentation of the "ninka" script is maintained
as pod directly inside the script itself (bin/ninka)
* at build time via "make" a man page is generated under blib/man1/ninka.1p
-rw-r--r-- | Changes | 2 | ||||
-rw-r--r-- | Makefile.PL | 4 | ||||
-rw-r--r-- | README | 118 | ||||
-rwxr-xr-x | bin/ninka | 70 | ||||
-rw-r--r-- | lib/Ninka.pm | 4 | ||||
-rw-r--r-- | man/ninka.1 | 83 |
6 files changed, 109 insertions, 172 deletions
@@ -22,7 +22,7 @@ * ninka.pl: fixed bug in finding the path of where ninka was being executed from (reported by Ryan Biesemeyer) - * Fixed quotes in perl (René bScheibe) + * Fixed quotes in perl (René Scheibe) 2015-01-05 dmg <dmg@uvic.ca> diff --git a/Makefile.PL b/Makefile.PL index 95f568b..b29cf02 100644 --- a/Makefile.PL +++ b/Makefile.PL @@ -7,7 +7,7 @@ WriteMakefile( NAME => 'Ninka', VERSION_FROM => 'lib/Ninka.pm', ABSTRACT_FROM => 'lib/Ninka.pm', - LICENSE => 'agpl_3', + LICENSE => 'gpl_2', AUTHOR => [ 'Daniel M. German <dmg@uvic.ca>', 'Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>', @@ -38,7 +38,7 @@ WriteMakefile( resources => { homepage => 'http://ninka.turingmachine.org/', repository => 'https://github.com/dmgerman/ninka', - license => 'http://www.gnu.org/licenses/agpl-3.0.html', + license => 'http://www.gnu.org/licenses/gpl-2.0.html', }, }, ); @@ -11,16 +11,13 @@ under which a source file is made available. This tool uses a source file as input and outputs the licenses identified within that file. -If you need to know the detail of Ninka, please see the following -paper: +If you need to know the detail of Ninka, please see the following paper: Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method for automatic license identification of source code files. In 25nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or -download it from - -http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf +download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf. If you use Ninka for research purposes, we would appreciate you cite the above paper. @@ -28,13 +25,13 @@ the above paper. * Contributors - Paul Clough for his code to split sentences -- Anthony Kohan for writing the excel and sqlite backends. -- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions +- Anthony Kohan for writing the excel and sqlite backends +- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions +- René Scheibe for modularizing the code * License - Except for the directories comments and splitter, Ninka is licensed - under the GPLv2+ + Ninka is licensed under the GPLv2+: Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German @@ -51,13 +48,10 @@ the above paper. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. - - splitter.pl is a derivative work of the Rule-based sentence - splitter script by Paul Paul Clough. Please see splitter/README - for details. + Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence + splitter script by Paul Paul Clough. - - comments is based on a program to remove comments by Jon Newman, - it is released under the GNU General Public License Version 2 or - (at your option) any later version. + comments is based on a program to remove comments by Jon Newman. * Requirements @@ -70,40 +64,25 @@ the above paper. * How to install 1. Unpack the distribution in a directory. - 2. Optional: Build and install comments (make sure it is somwehere in the - path) (see directory comments) - + 2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments) -* Usage: +* Usage -Ninka uses a pipe model (see below). Each step of the "pipe" creates a -file, but +ninka [options] filename -ninka.pl [options] [filename] +Available options: -Available options + -i create intermediary files -v verbose - -d delete intermediate files - -C force creation of comments file - -c stop after creation of comments - -S force creation of sentences file - -s stop after creation of sentences - -G force creation of goodsent file - -g stop after creation of goodsent - -T force creation of senttok file - -t stop after creation of senttok - -L force creation of license file - -f force all processing - Example: - ninka.pl foo.c + ninka -i foo.c It will create five files: - 1. foo.c.comments: extracted the first two comments blocks, where - the license is usually + 1. foo.c.comments: extracted the first comments blocks, where + the license is usually included 2. foo.c.sentences: creates the list of sentences in the license statement 3. foo.c.goodsent: contains sentences that are likely to be part of @@ -117,69 +96,60 @@ It will create five files: - Licenses - Unmatched sentences in *.senttok that were not matched - - +The files are not required for Ninka's functionality. But they can help +to debug license detection issues. * Ninka model Ninka uses a pipe-model. Each stage of the pipe does something very specific: - 1. Comment extractor. +1. Comment extractor - - directory: extComments + - Module: Ninka::CommentExtractor - - command: extComments.pl, might use comments (included in distribution) + - Purpose: Extracts top comments of source code. + If no comment extractor is known for the language, + then extracts top lines from source (currently 700) - - Purpose: Extracts top comments of source code. If no - comment extractor is known for the language, then extracts top lines from source (currently 700) - - - Creates <filename>.comments file + - Output: <filename>.comments 2. Split sentences in comments - - directory: splitter - - - command: splitter.pl - - - Purpose: Ninka works by matching sentences of licenses, hence - it needs to properly break text into sentences. - - - Outputs <filename>.sentences - -3. Filter "good" sentences. + - Module: Ninka::SentenceExtractor - - directory filter + - Purpose: Ninka works by matching sentences of licenses, + hence it needs to properly break text into sentences. - - command: filter.pl + - Output: <filename>.sentences - - Purpose: some sentences are related to a license, some are - not. It is valuable to know if a file contains lines that look - like a license or not (e.g. to know that a file has no license) +3. Filter "good" sentences - - Outputs: <filename>.goodsent, and <filename>.badsent (not used) + - Module: Ninka::SentenceFilter -4. Tokenizes sentences + - Purpose: Some sentences are related to a license, some are not. + It is valuable to know if a file contains lines that look like + a license or not (e.g. to know that a file has no license). - - Directory senttok + - Output: <filename>.goodsent and <filename>.badsent - - command: senttok.pl +4. Tokenize sentences - - Purpose: It creates a file that corresponds to the recognized - sentence tokens. For each sentence, it outputs its sentence token, or unknown otherwise. + - Module: Ninka::SentenceTokenizer - - Outputs: <filename>.senttok + - Purpose: It creates a file that corresponds to the recognized sentence tokens. + For each sentence, it outputs its sentence token, or unknown otherwise. -5. Matches sentences to licenses + - Output: <filename>.senttok - - Directory matcher +5. Match sentences to licenses - - Command: matcher.pl + - Module: Ninka::LicenseMatcher - - Purpose: looks at the sequence of sentence tokens and outputs the licenses found + - Purpose: It looks at the sentence tokens and outputs the licenses found. - Output: <filename>.license -The script ninka.pl takes care of all these steps, and optionally removes +The script ninka takes care of all these steps, and optionally creates intermediary files, and writes to the stdout the licenses found. ------ @@ -1,4 +1,4 @@ -#!/usr/bin/env perl +#!/usr/bin/perl use strict; use warnings; @@ -19,7 +19,7 @@ sub parse_cmdline_parameters { if (!getopts('iv', \%opts) || scalar(@ARGV) == 0) { print STDERR "Ninka v${Ninka::VERSION} -Usage: $0 [options] <filename> +Usage: ninka [options] <filename> Options: -i create intermediary files @@ -32,29 +32,79 @@ Options: __END__ +=encoding utf8 + =head1 NAME -ninka +ninka - source file license identification tool + +=head1 SYNOPSYS + +B<ninka> [options] F<filename> =head1 DESCRIPTION -Scans a file and returns the found licenses. +Scans a source file and returns the found licenses. + +=head1 OPTIONS + +=over + +=item B<-i> + +create intermediary files (for debugging) + +=item B<-v> + +verbose + +=back + +=head1 EXAMPLES + +=over + +=item B<ninka> F<foo.c> + +Determine the licenses in file F<foo.c>. + +=item B<ninka -i> F<foo.c> + +Determine the licenses in file F<foo.c> and create intermediary files (for debugging). + +=item find * | xargs -n1 -I@ B<ninka> '@' + +Determine the licenses of files in a directory. + +=back + +=head1 AUTHOR + +B<ninka> was written by Daniel M. German <dmg@uvic.ca> and Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>. + +=head1 SEE ALSO + +Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method +for automatic license identification of source code files. In 25nd IEEE/ACM +International Conference on Automated Software Engineering (ASE 2010). + +You can download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf. =head1 COPYRIGHT AND LICENSE -Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German +Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German, 2015 René Scheibe -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU Affero General Public License as -published by the Free Software Foundation, either version 3 of the +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as +published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Affero General Public License for more details. +GNU General Public License for more details. -You should have received a copy of the GNU Affero General Public License +You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. =cut diff --git a/lib/Ninka.pm b/lib/Ninka.pm index 526aeab..8f454cd 100644 --- a/lib/Ninka.pm +++ b/lib/Ninka.pm @@ -68,7 +68,7 @@ __END__ =head1 NAME -Ninka - Find licenses in source files. +Ninka - source file license identification tool =head1 SYNOPSIS @@ -82,7 +82,7 @@ Ninka - Find licenses in source files. =head1 DESCRIPTION -Scans a file and returns the found licenses. +Scans a source file and returns the found licenses. =head1 COPYRIGHT AND LICENSE diff --git a/man/ninka.1 b/man/ninka.1 deleted file mode 100644 index 9cd2d57..0000000 --- a/man/ninka.1 +++ /dev/null @@ -1,83 +0,0 @@ -.TH NINKA 1.3 "May 2015" ninka -.SH NAME -ninka \- source file license identification tool -.SH SYNOPSYS -.SY ninka -.OP \-vfCcSsGgTtLd -.OP \-\- -.RI [ file ] -.YS - -.SH DESCRIPTION - -Analyses source files to determine the license they fall under. Takes a source -file as input and outputs the file's license. - -.SH OPTIONS - -.IP \-v -verbose - -.IP \-f -force all processing - -.IP \-C -force creation of comments -.IP \-c -stop after creation of comments - -.IP \-S -force creation of sentences -.IP \-s -stop after creation of sentences - -.IP \-G -force creation of goodsent -.IP \-g -stop after creation of goodsent - -.IP \-T -force creation of senttok -.IP \-t -stop after creation of senttok - -.IP \-L -force creation of matching - -.IP \-d -delete intermediate files - -.IP \-\- -Stop processing options - -.SH EXAMPLES - -.TP -\fBninka\fR \fIfoo.c\fR -Determine the licenses in file foo.c - -.TP -.BI ninka\ \-d \ foo.c -Determine the license in file foo.c and delete intermediary files - -.TP -find * | xargs \-n1 \-I@ \fBninka\fR '@' -Determine the licenses of files in a directory. - - -.SH AUTHOR - -\fBninka\fR was written by Daniel M. German <dmg@uvic.ca> and Yuki Manabe -<y-manabe@ist.osaka-u.ac.jp>. ninka itself is licensed under the AGPLv3+. This -manpage was written by Ryan Kavanagh <ryanakca@kubuntu.org> for the Debian -project and is also licensed under the AGPLv3+. - -.SH SEE ALSO - -Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method -for automatic license identification of source code files. In 25nd IEEE/ACM -International Conference on Automated Software Engineering (ASE 2010). - -You can email Daniel M. German <dmg@uvic.ca> for a copy or download it from -.UR http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf -.UE |