Search

New Stemmers

Overview design of Search mechanism.

The serching is a fully client-side implementation of querying texts for content - searching. There's no server involved. So, the search queries by the users are processed by - JavaScript inside the browser, and displays the matching results by comparing the query with - a simplified 'index' that too resides in JavaScript. Mainly the search mechanism has two - parts.

Indexing: First we need to traverse the content in the docs/content folder and - index the words in it. This is done by webhelpindexer.jar in - xsl/extentions/ folder. You can invoke it by ant - index command from the root of webhelp of directory. The source of - webhelpindexer is now moved to it's own location at - trunk/xsl-webhelpindexer/. Checkout the Docbook trunk svn - directory to get this source. Then, do your changes and recompile it by simply running - ant command. My assumption is that it can be opened by Netbeans IDE by - one click. Or if you are using IntelliJ Idea, you can simply create a new project from - existing sources. Indexer has extensive support for features such as word scoring, - stemming of words, and support for languages English, German, French. For CJK - (Chinese, Japanese, Korean) languages, it uses bi-gram tokenizing to break up the - words (since CJK languages does not have spaces between words).
When ant index is run, it generates five output files:
- htmlFileList.js - This contains an array named - fl which stores details all the files indexed by the indexer. - Further, the doStem in it defines whether stemming should be used. It defaults - to false.
- htmlFileInfoList.js - This includes some meta data - about the indexed files in an array named fil. It includes details - about file name, file (html) title, a summary of the content.Format would look - like, fil["4"]= "ch03.html@@@Developer Docs@@@This chapter provides an - overview of how webhelp is implemented."; -
- index-*.js (Three index files) - These three files - actually stores the index of the content. Index is added to an array named - w.
Querying: Query processing happens totally in client side. Following JavaScript - files handles them.
- nwSearchFnt.js - This handles the user query and - returns the search results. It does query word tokenizing, drop unnecessary - punctuations and common words, do stemming if docbook language supports it, - etc.
- {$indexer-language-code}_stemmer.js - This includes the - stemming library. nwSearchFnt.js file calls - stemmer method in this file for stemming. ex: var stem = - stemmer(foobar); -
-

Prev	Up	Next
	Home

README: Web-based Help from DocBook XMLDeveloper Docs

Search

README: Web-based Help from DocBook XML

Developer Docs