Setting Up Searches on the WN Server

Version 2.0.3
[Previous] [Next] [Up] [Top] [Search] [Index]

Setting Up Searches on the WN Server

One of the design goals of WN is provide the maintainer with tools to create extensive navigational aids for the server. A variety of search mechanisms are available that provide this capability.

5.1 Title Searches

In response to the URL:

<http://host/dir/search=title>

the server will provide an HTML form (automatically generated or prepared by the maintainer) asking for a regular expression search term. When supplied the server will search the index.cache files in /dir and designated subdirectories for a items whose titles contain a match for the search term. An HTML document with a menu of these items is returned. Subdirectories are designated for recursive searching by an entry in directory record of the index file like:

Subdirs=dir1,dir2,dir3

You can customize the message offered requesting a search term by creating an HTML form whose ACTION is the URL "http://host/dir/search=title" and which uses the GET method to return the search term with "NAME=query".

The regular expressions recognized by the WN server are the same as those of the UNIX grep(1) utility (though this utility is not used as the server has its own regular expression functions). The more general regular expressions used for example in the UNIX egrep(1) utility are not supported by WN.

5.2 Keyword Searches

Like title searches except matches are sought in keywords instead of titles. Keywords for HTML documents are automatically obtained from <META> headers. For other documents (or HTML documents) they can be manually supplied in the index file. This is done by including a line like:

Keywords=keyword1, keyword2, etc.

in the relevant document's record in the index file. The URL to use to cause this search to be done is:

<http://host/dir/search=keyword>

5.3 Title/Keyword Searches

Like the above keyword and title searches except the match can be either in the keyword or the title. The URL to use as the ACTION in a form or simply to invoke the search is:

<http://host/dir/search=synopsis>

If a recursive title, keyword or fielded search is requested and some directories have restricted access, only those directories which have the same access file or the same password realm as the directory where the search started will be searched. In fact, if an "Accessfile=" directive is used the path must be the same for both directories (and must necessarily be of the form "Accessfile=~/dir/.access" or "Accessfile=/dir/.access" rather than "Accessfile=.access").

5.4 Fielded Searches for User Supplied Fields

The maintainer can supply up to 20 additional field values associated with a document. These are used for searching purposes in the same way that "Keywords=" are. This is intended to give some additional "keyword like" fields, for example, document author or document id number. It works exactly like keywords except these values are not extracted from HTML files, but must be created with a line like:

Field3=any text here

in the index file. The '3' in this example can be replaced with any number from 0 to 19. The URL to use as the ACTION in a form or simply to invoke the search in the example above is:

<http://host/dir/search=field3>

Like keyword and title searches the search term for a fielded search can be any UNIX grep(1) utility like regular expression.

5.5 Context Searches

Unlike keyword, title and fielded searches this is a full text search of all "text/*" documents in one directory (not subdirectories). These searches are also limited to the main files -- they will not find matches in wrappers and included files. The returned HTML document contains a list of titles of documents containing a match each with a sublist of the lines from those documents containing the match. This provides one line of context for the match. For HTML documents selecting the matched expression in one of these takes you to the document with your viewer focused on the matching location. The primary intent of this feature is to provide full text searching for an HTML "document" which might consist of a substantial number of files. It is possible to customize the text of the HTML response containing the matches. This is done with a Searchwrapper directive.

The URL to use as the ACTION in a form or simply to invoke the search is:

<http://host/dir/search=context>

It is possible to mark HTML documents with comments so that only part of them is searched. This is done with lines consisting of the comment "" which turns off searching until the line consisting of "" is encountered.

5.6 Grep Searches

A grep search is just like a context search, except that only a list of anchors pointing to files containing a match is returned. There are no lines of context showing the match. To do a grep search on the files in directory dir use:

<http://host/dir/search=grep>

5.7 Line Searches

A line search is just like a context search, except that only one list of all matching lines is returned, instead of the matching lines being sublists of a list of files containing a match. That is, all the items in sublists of a context search are concatenated in one large list of lines containing matches. The matching items are still anchors pointing to items in their respective files. To do a line search on the files in directory dir use:

<http://host/dir/search=line>

5.8 File Context and Grep Searches

A file context search is just like a context search, except limited to a single file. The file grep search returns a text/html document containing the lines in the file matching matching the regular expression. These lines will be converted to plain text and surrounded by <pre> and </pre> tags. This is done because isolated tags or partial tags taken from an HTML document would be unlikely to function properly. It is likely that you will want to use a Searchwrapper directive with a file grep search.

The URL's to invoke these searches on file foo are:

<http://host/dir/foo;search=context> <http://host/dir/foo;search=grep>

5.9 Search a Directory or Search a Hierarchy?

The different searches described above are of two types: those that search the index.cache files like field, keyword and title searches and those that do full text searches of multiple files, like context, grep, field grep and line searches. The first type recursively descends all subdirectories listed in a "Subdirs=" directive and searches all the index.cache files. The second only searches files in the one directory specified in the search.

The reason for this is efficiency. context, grep, field grep and line searches are not a replacement for WAIS, glimpse, webglimpse or some other indexed search engine. The intent of these searches is to allow a full text search of a single conceptual HTML document that is made up of a number of files with links. These searches work great with something like the HTML4.0 specification (see http://hopf.math.nwu.edu/html4/) which consists of a number of files of moderate size, but they would be very slow with 1000 files spread out in a data hierarchy. For that you really need an indexing search engine.

Another limitation of these searches is that they will not find matches in wrappers and included files.

On the other hand, all of the searching methods listed above are built into the server and require no additional effort for the maintainer. You don't need to produce or maintain an index. They are simply referenced with URLs like "<http://host/dir/search=context>" where /dir is any directory containing files to be served and an index.cache listing them. Of course search permission can be denied for any directory or any file contained in that directory.

5.10 List Searches

The server will search an HTML document looking for an unordered list of anchors linking to WWW objects. The contents of each anchor will be searched for a match to the supplied regular expression. The search returns an HTML document containing an unordered list of those anchors with a match. This is quite useful when combined with the wn_mkdigest utility which creates HTML documents to be searched in this way from files with internal structure like mail or news digests, mailing lists, etc.

The URL to invoke this search on file foo:

<http://host/dir/foo;search=list>

5.11 Index Searches

Indexed searches can be supported in WN by auxiliary modules. Two such modules, wnseven_m and wnsectsearch, are provided as examples and maintainers may wish to create others. To use such a module you should have a form action be something like http://host/dir/search=index.

Then in the index file in the directory dir you should have a line like:

Search-Module=/full/path/to/searchmod

The program searchmod should read the environment variable QUERY_STRING and return a partial HTML document. The typical case would be the program returns an unordered list of anchors to documents containing a match to the query string. This list can be wrapped by including a "Searchwrapper=" directory record. If it is not, then a default wrapper with text like "Here are the matches for your search" is supplied.

5.12 Search Modes

The different types of searches, (e.g. keyword, context, etc), are called the modes used by the search. Normally the mode is set by adding, for example "search=context" to the end of a URL. However, if an HTML form is used to initiate the search, it may be desirable to allow the mode to be selected by a form variable. Thus an HTML form like:

<form action="search=mode" method="GET">
Enter your search term <input name="query" size=15>
<input type="submit" value="Search"> by
<input type="radio" name="mode" value="title" checked> title or
<input type="radio" name="mode" value="keyword"> keywords
</form>

will execute either a title or a keyword search depending on whether the user checks the radio button for "title" or "keyword". The URL requested will end with "search=mode", but actually it could be anything since the "mode=title" (if that is what is checked) which will be in the query part of the URL will override whatever follows "search=" in the base URL.

5.13 Searchwrappers

By default when a search is done an HTML file is created with text like "Here are the matches for ...". You may wish to customize this response which is done with the "Searchwrapper=" directive in either a file record or directory record of the index file.

The line:

Searchwrapper=swrap.html

specifies that the HTML file swrap.html in the current directory should be used as a wrapper for the output of all searches on this directory (if it is a directory record entry) or file (if it is a file directive). This wrapper differs from other wrappers in that it can have only a single "" line. An unordered list of anchors to the matching items will be inserted at the location of this line. You can, of course, insert the client supplied search term by use of the line "" in this file.

The remainder of this file can be anything you wish and is often an HTML form allowing subsequent searches.

If a search fails to find any matches then a default HTML response indicating this is sent. This response can also be customized but only if a "Searchwrapper=" is also used. The line:

Nomatchsub=foo.html

specifies that the HTML file foo.html in the current directory should be used for the output of all searches (title, keyword, etc) on this directory (or file if it is a file directive) which return no matches. If "Nomatchsub=" is used and a "Searchwrapper=" has not been defined an error is logged and the nomatchsub file is ignored. The nomatchsub file must be in the directory being searched and its name must not contain a '/'.

[Previous] [Next] [Up] [Top] [Search] [Index]