diff options
Diffstat (limited to 'doc/xmlwf.1')
-rw-r--r-- | doc/xmlwf.1 | 146 |
1 files changed, 93 insertions, 53 deletions
diff --git a/doc/xmlwf.1 b/doc/xmlwf.1 index b2c5616..cc213b8 100644 --- a/doc/xmlwf.1 +++ b/doc/xmlwf.1 @@ -3,7 +3,7 @@ .\" <http://shell.ipoline.com/~elmert/comp/docbook2X/> .\" Please send any bug reports, improvements, comments, patches, .\" etc. to Steve Cheng <steve@ggi-project.org>. -.TH "XMLWF" "1" "22 April 2002" "" "" +.TH "XMLWF" "1" "24 January 2003" "" "" .SH NAME xmlwf \- Determines if an XML document is well-formed .SH SYNOPSIS @@ -12,12 +12,13 @@ xmlwf \- Determines if an XML document is well-formed .SH "DESCRIPTION" .PP -\fBxmlwf\fR uses the Expat library to determine -if an XML document is well-formed. It is non-validating. +\fBxmlwf\fR uses the Expat library to +determine if an XML document is well-formed. It is +non-validating. .PP -If you do not specify any files on the command-line, -and you have a recent version of xmlwf, the input -file will be read from stdin. +If you do not specify any files on the command-line, and you +have a recent version of \fBxmlwf\fR, the +input file will be read from standard input. .SH "WELL-FORMED DOCUMENTS" .PP A well-formed document must adhere to the @@ -26,7 +27,8 @@ following rules: \(bu The file begins with an XML declaration. For instance, <?xml version="1.0" standalone="yes"?>. -\fBNOTE:\fR xmlwf does not currently +\fBNOTE:\fR +\fBxmlwf\fR does not currently check for a valid XML declaration. .TP 0.2i \(bu @@ -48,33 +50,37 @@ or double). .PP If the document has a DTD, and it strictly complies with that DTD, then the document is also considered \fBvalid\fR. -xmlwf is a non-validating parser -- it does not check the DTD. -However, it does support external entities (see the -x option). +\fBxmlwf\fR is a non-validating parser -- +it does not check the DTD. However, it does support +external entities (see the \fB-x\fR option). .SH "OPTIONS" .PP When an option includes an argument, you may specify the argument either -separate ("d output") or mashed ("-doutput"). xmlwf supports both. +separately ("\fB-d\fR output") or concatenated with the +option ("\fB-d\fRoutput"). \fBxmlwf\fR +supports both. .TP \fB-c\fR -If the input file is well-formed and xmlwf doesn't -encounter any errors, the input file is simply copied to +If the input file is well-formed and \fBxmlwf\fR +doesn't encounter any errors, the input file is simply copied to the output directory unchanged. -This implies no namespaces (turns off -n) and -requires -d to specify an output file. +This implies no namespaces (turns off \fB-n\fR) and +requires \fB-d\fR to specify an output file. .TP \fB-d output-dir\fR Specifies a directory to contain transformed representations of the input files. -By default, -d outputs a canonical representation +By default, \fB-d\fR outputs a canonical representation (described below). -You can select different output formats using -c and -m. +You can select different output formats using \fB-c\fR +and \fB-m\fR. The output filenames will be exactly the same as the input filenames or "STDIN" if the input is -coming from STDIN. Therefore, you must be careful that the +coming from standard input. Therefore, you must be careful that the output file does not go into the same directory as the input -file. Otherwise, xmlwf will delete the input file before -it generates the output file (just like running +file. Otherwise, \fBxmlwf\fR will delete the +input file before it generates the output file (just like running cat < file > file in most shells). Two structurally equivalent XML documents have a byte-for-byte @@ -86,36 +92,45 @@ http://www.jclark.com/xml/canonxml.html . .TP \fB-e encoding\fR Specifies the character encoding for the document, overriding -any document encoding declaration. xmlwf -has four built-in encodings: +any document encoding declaration. \fBxmlwf\fR +supports four built-in encodings: US-ASCII, UTF-8, UTF-16, and ISO-8859-1. -Also see the -w option. +Also see the \fB-w\fR option. .TP \fB-m\fR Outputs some strange sort of XML file that completely describes the the input file, including character postitions. -Requires -d to specify an output file. +Requires \fB-d\fR to specify an output file. .TP \fB-n\fR Turns on namespace processing. (describe namespaces) --c disables namespaces. +\fB-c\fR disables namespaces. .TP \fB-p\fR Tells xmlwf to process external DTDs and parameter entities. -Normally xmlwf never parses parameter entities. --p tells it to always parse them. --p implies -x. +Normally \fBxmlwf\fR never parses parameter +entities. \fB-p\fR tells it to always parse them. +\fB-p\fR implies \fB-x\fR. .TP \fB-r\fR -Normally xmlwf memory-maps the XML file before parsing. --r turns off memory-mapping and uses normal file IO calls instead. +Normally \fBxmlwf\fR memory-maps the XML file +before parsing; this can result in faster parsing on many +platforms. +\fB-r\fR turns off memory-mapping and uses normal file +IO calls instead. Of course, memory-mapping is automatically turned off -when reading from STDIN. +when reading from standard input. + +Use of memory-mapping can cause some platforms to report +substantially higher memory usage for +\fBxmlwf\fR, but this appears to be a matter of +the operating system reporting memory in a strange way; there is +not a leak in \fBxmlwf\fR. .TP \fB-s\fR Prints an error if the document is not standalone. @@ -127,17 +142,21 @@ Turns on timings. This tells Expat to parse the entire file, but not perform any processing. This gives a fairly accurate idea of the raw speed of Expat itself without client overhead. --t turns off most of the output options (-d, -m -c, ...). +\fB-t\fR turns off most of the output options +(\fB-d\fR, \fB-m\fR, \fB-c\fR, +\&...). .TP \fB-v\fR -Prints the version of the Expat library being used, and then exits. +Prints the version of the Expat library being used, including some +information on the compile-time configuration of the library, and +then exits. .TP \fB-w\fR -Enables Windows code pages. -Normally, xmlwf will throw an error if it runs across -an encoding that it is not equipped to handle itself. With --w, xmlwf will try to use a Windows code page. See -also -e. +Enables support for Windows code pages. +Normally, \fBxmlwf\fR will throw an error if it +runs across an encoding that it is not equipped to handle itself. With +\fB-w\fR, xmlwf will try to use a Windows code +page. See also \fB-e\fR. .TP \fB-x\fR Turns on parsing external entities. @@ -164,34 +183,40 @@ And here are some examples of external entities: .fi .TP \fB--\fR -For some reason, xmlwf specifically ignores "--" -anywhere it appears on the command line. +For some reason, \fBxmlwf\fR specifically +ignores "--" anywhere it appears on the command line. .PP -Older versions of xmlwf do not support reading from STDIN. +Older versions of \fBxmlwf\fR do not support +reading from standard input. .SH "OUTPUT" .PP -If an input file is not well-formed, xmlwf outputs -a single line describing the problem to STDOUT. -If a file is well formed, xmlwf outputs nothing. +If an input file is not well-formed, +\fBxmlwf\fR prints a single line describing +the problem to standard output. If a file is well formed, +\fBxmlwf\fR outputs nothing. Note that the result code is \fBnot\fR set. .SH "BUGS" .PP According to the W3C standard, an XML file without a declaration at the beginning is not considered well-formed. -However, xmlwf allows this to pass. +However, \fBxmlwf\fR allows this to pass. .PP -xmlwf returns a 0 - noerr result, even if the file is -not well-formed. There is no good way for a program to use -xmlwf to quickly check a file -- it must parse xmlwf's STDOUT. +\fBxmlwf\fR returns a 0 - noerr result, +even if the file is not well-formed. There is no good way for +a program to use \fBxmlwf\fR to quickly +check a file -- it must parse \fBxmlwf\fR's +standard output. .PP -The errors should go to STDERR, not stdout. +The errors should go to standard error, not standard output. .PP -There should be a way to get -d to send its output to STDOUT -rather than forcing the user to send it to a file. +There should be a way to get \fB-d\fR to send its +output to standard output rather than forcing the user to send +it to a file. .PP -I have no idea why anyone would want to use the -d, -c -and -m options. If someone could explain it to me, I'd -like to add this information to this manpage. +I have no idea why anyone would want to use the +\fB-d\fR, \fB-c\fR, and +\fB-m\fR options. If someone could explain it to +me, I'd like to add this information to this manpage. .SH "ALTERNATIVES" .PP Here are some XML validators on the web: @@ -201,3 +226,18 @@ http://www.hcrc.ed.ac.uk/~richard/xml-check.html http://www.stg.brown.edu/service/xmlvalid/ http://www.scripting.com/frontier5/xml/code/xmlValidator.html http://www.xml.com/pub/a/tools/ruwf/check.html +.fi +.SH "SEE ALSO" +.PP + +.nf +The Expat home page: http://www.libexpat.org/ +The W3 XML specification: http://www.w3.org/TR/REC-xml +.fi +.SH "AUTHOR" +.PP +This manual page was written by Scott Bronson <bronson@rinspin.com> for +the Debian GNU/Linux system (but may be used by others). Permission is +granted to copy, distribute and/or modify this document under +the terms of the GNU Free Documentation +License, Version 1.1. |