An Overview of the WN server

Version 2.0.3
[Previous] [Next] [Up] [Top] [Search] [Index]

An Overview of the WN Server

An HTTP server should do more than just serve files. It should play an active role in both navigation and presentation issues. It is my hope that this server provides better tools for the creative webmaster.
- John Franks

WN is a server for the Hypertext Transfer Protocol HTTP/1.1. Its primary design goals are security, robustness, and flexibility, in that order. One of its objectives is to provide functionality usually available only with complex CGI programs without the necessity of writing or using these programs. (Of course CGI/1.1 is fully supported for those who want it). Despite this extensive functionality the WN executable is substantially smaller than the CERN httpd, NCSA httpd or Apache servers.

WN was planned with a focus on serving HTML documents. This means such things as enabling full text searching of a single logical HTML document which may consist of many files on the server, or allowing users to search all titles on the server and obtain a menu of matching items, or allowing users to download a total logical document for printing which, in fact, consists of many linked files on the server. All of these are done in a way which is transparent to the user (and largely transparent to the maintainer)! The "User's Guide for the WN Server", which this chapter is part of, provides a good example of many of these features.

Another feature not found in many other servers is conditionally served text. Often a server maintainer may wish to serve different versions of a document to different clients. By adding simple HTML comments to documents and marking those documents to be "parsed" by the server, the maintainer can arrange that different sections or entirely different documents are sent to clients, based on such things as the client's domain name, IP address, browser type, browser "Accept" header, "Cookie header", etc. This feature is described in more detail in the section "Conditional Text: If, Else, and Endif" in this guide.

But these are only examples of many new tools WN makes available to webmasters.

The design and security mechanisms of WN differ substantially from those of the httpd servers available from CERN and NCSA so a brief description of how they work is useful.

1.1 How WN Works

Files served by an HTTP server may have many attributes relevant to their serving. These attributes include content-type, optional title, optional expiration date, optional keywords, whether the file should be parsed for server-side includes, access restrictions, etc. Some servers try to encode this information in ad hoc ways, in a file name suffix, or in a global configuration file. The approach of WN is to keep this information in small databases, one for each directory in the document hierarchy.

The WN maintainer never needs to understand the format of these database files (named index.cache by default), but this format is very simple and a brief description will indicate how WN works. When the server receives a request, say for /dir/foo.html, it looks in the file /dir/index.cache which contains lines like:

file=foo.html&content=text/html&title=whatever...

If the server finds a line starting with "file=foo.html" then the file will be served. If such a line does not exist the file will not be served (unless special permission to serve all files in the directory has been granted). This is the basis of WN security. Unlike other servers, the default action for WN is to deny access to a file. A file can only be served if explicit permission to do so has been granted by entering it in the index.cache database or if explicit permission to serve all files in /dir has been given in the index.cache file in /dir. This database also provides other security functions. For example, restricting the execution of CGI/1.1 programs can be done on the basis of the ownership (or group ownership) of their index.cache files. There is no need to limit execution to programs located in particular designated directories. The location of a file in the data hierarchy should be orthogonal to security restrictions on it and this is the case with the WN server.

The index.cache database file has a number of other functions beyond its security role. Attributes of foo.html which can be computed before it is served and which don't often change are stored in the fields of the line starting file=foo.html. For example, the MIME content type "text/html" must be deduced from the filename suffix ".html". This is done once at the time index.cache is created and need not be done every time the file is served.

The title of a file is another example. With the WN server every file served has a title (even binaries) and optionally has a list of keywords, an expiration date, and other fields associated with it. For an HTML document the title and the keywords are automatically extracted from the header of the document and stored in fields of that file's line in its index.cache file. These are used for the built-in keyword and title searches which the server supports. The maintainer also has the option of adding his own fields to this database file. They could contain such things as document author, document id number, etc. These user defined fields can be searched with the built-in WN searches or their contents can be inserted into the document, on the fly, as it is served

So how are the index.cache databases created? Their format is quite simple and a maintainer is free to create them any way she chooses, but normally they are created by the utility wndex (pronounced "windex"). This program, which is part of the WN distribution, is designed to produce the index.cache file from a file with a friendlier format with the default name "index". A very simple index file might look like:

File=foo.html File=clap.au Title=Sound of one hand clapping File=hand Title=Picture of one hand clapping Content-type=text/gif

Of course if the file hand were named hand.gif the content-type line would not be necessary as wndex could deduce the type from the .gif suffix. Likewise it is not necessary to give a title for foo.html because wndex will read the HTML header from that file and extract the title and perhaps other things like keywords and expiration date.

1.2 Features of WN

The WN server has several features which are not available with other servers or only available through the use of CGI/1.1 programs.

1.2.1 Searching

One of the design goals of WN is to provide the maintainer with tools to create extensive navigational aids for the server. A variety of search mechanisms are available.

Title searches: In response to the URL <http://host/dir/search=title> the server will provide an HTML form (automatically generated or prepared by the maintainer) asking for a regular expression search term. When supplied the server will search the index.cache files in /dir and designated subdirectories for a items whose titles contain a match for the search term. An HTML document with a menu of these items is returned.
Keyword searches: Like title searches except matches are sought in keywords instead of titles. Keywords for HTML documents are automatically obtained from <META> headers. For other documents (or HTML documents) they can be manually supplied in the index file.
Title/Keyword search: Like the above except the match can be either in the keyword or the title.
User supplied field searches: Like keyword searches except matches are sought in user supplied fields. The user supplied fields can contain any text and are attached to a document by entering them in that document's record in the index file. Their purpose is to include items like a document id number, or document author in the index.cache database. A field search could then produce all documents by a given author for example. Or using regular expressions in the search term produce a list of all documents whose id number satisfy certain criteria.
Context searches: Unlike the title and keyword searches this is a full text search of all text/* documents in one directory (not subdirectories). The returned HTML document contains a list of all the titles of documents containing a match together with a sublist of the lines from those documents containing the match. This provides one line of context for the match. For HTML documents the matched expression in each of these lines will be a highlighted anchor. Selecting one takes you to the document with your viewer focused on the matching location. The primary intent of this feature is to provide full text searching for an HTML "document" which might consist of a substantial number of files.
File context and grep searches: A file context search is just like a context search, except limited to a single file. The file grep search returns a text/html document containing the lines in the file matching matching the regular expression.
List searches: The server will search an HTML document looking for an unordered list of anchors linking to Web objects. The contents of each anchor will be searched for a match to the supplied regular expression. The search returns an HTML document containing an unordered list of those anchors with a match. This is quite useful with the wn_mkdigest utility which creates HTML documents to be searched in this way from files with internal structure like mail or news digests, mailing lists, etc.
Index searches: This is a mechanism by which arbitrary search engines can be linked to WN through a search-module. The server will provide the search term to the search-module and expects an HTML list of links to matching items to be returned.

All of the searching methods listed above except the index searches are built into the server and require no additional effort for the maintainer. They are simply referenced with URLs like <http://host/dir/search=context> where /dir is any directory containing files to be served and an index.cache listing them. Of course search permission can be denied for any directory or any file contained in that directory.

1.2.2 Parsed Text, Server-Side Includes and Wrappers

The WN server has extensive capabilities for automatically including files in one which is being served or "wrapping" a served file with another, i.e. pre-pending and post-pending information to a file being served. This latter is useful if you wish to place a standard message at the beginning or end (or both) of a large collection of files. For security all files included in a file or used as a wrapper for it are listed in that file's index.cache file. This combined with various available security options, like requiring that a served file and all its includes and wrappers have the same owner (or group owner) as the index.cache file listing them, provide a safe and productive Web environment.

One important application of wrappers is to customize the HTML documents returned listing the successful search matches. If a search item is given a wrapper the server assumes that it contains text describing the search and it merely inserts an unordered list of links to the matching items.

In addition to including files the output of programs may be inserted and the value of any user defined field in the index.cache database entry for a file may be inserted.

Also parsed text may conditionally insert items with a simple if - else - endif construct. based on Accept headers, User-Agent headers, Referer headers etc.

1.2.3 Filters

An arbitrary filter can be assigned to any file to be served. A filter is a program which reads the file and has the program output served rather than the content of the file. The name of the filter is another field in the file's line in its index.cache file. One common use of this feature is for on-the-fly decompression. For, example, a file can be stored in its compressed form and assigned a filter like the UNIX zcat(1) utility which uncompresses it. Then the client is served the uncompressed file but only the compressed version is stored on disk. As another example, you might use the UNIX nroff(1) utility, "nroff -man", as a filter to process UNIX man files before serving. There are many other interesting uses of filters. Be creative!

1.2.4 Ranges

An arbitrary range of a file can be served if the server is accessed via a URL like <http://host/dir/foo;lines=20-30> and file is any text/* document it will return a text/plain document consisting of lines 20 through 30 of file foo. This is very useful for structured text files like address lists or digests of mail and news. A WN utility called wn_mkdigest will produce an HTML document with a list of links to separate sections (line ranges) of the structured file. The wn_mkdigest utility is executed with two regular expressions as arguments: one to match the section separator and the other to match the section title. For a mail digest, for example, these could be "^From" and "^Subject:" respectively. Then the sections of the virtual documents would be delimited by a line starting with "From" and would have the message subject as their title. A similar mechanism provides byte ranges from files.

[Previous] [Next] [Up] [Top] [Search] [Index]