WN home page

Version 2.0.3
[Previous] [Next] [Up] [Top] [Search] [Index]

Creating Your WN Data Directory


3.1 The index file

In each directory of your data hierarchy you create a file called index with information about each file you want to serve. The simplest index file might contain the single line:

Attributes=serveall

which when properly processed will grant the server permission to serve any file in the directory (but not in subdirectories). For more information about this directive see the section on the serveall attribute below. A more elaborate index file might look like the following:

Owner=mailto:webmistress@host.edu

File=file.txt
Title=This is a descriptive title for file.txt

# This is a comment
File=file2.html

File=soundfile
Title=This plays some sounds
Content-type=audio/basic

The file contains four groups of lines called records. The first record (the single line starting Owner= in this example) describes properties of the directory and is called the directory record. It can be empty, but in general it is a good idea for the directory record to contain an owner line, like the one above, referring to the maintainer of the directory.

The remainder of this index file has three file records describing three files, file.txt, file2.html and soundfile, in the directory which we wish to serve. The line starting with '#' is a comment. Wherever a '#' occurs the remainder of that line is treated as a comment (i.e. ignored).

The index file is processed with the utility wndex (pronounced "windex") to produce a small database called index.cache containing information about this directory and its contents. Detailed information on the wndex utility is given below, but simply running it with no arguments in a directory containing an index file will produce the index.cache file for that directory. This file contains all the information in the index file plus additional information gathered automatically about the files to be served. In particular the index.cache file will list the names of the files given in the File= lines of the index file. Any file on the server whose name is not listed in an index.cache file will not be served. This is the basis of WN security. For security reasons the server will refuse to use any index.cache file which is in reality a symbolic link to another file.

The index.cache database has a number of other functions beyond its security role. Attributes of the files listed in the index file which can be computed before they are served and which don't often change are stored in the index.cache file. For example, the MIME content type of soundfile is read from the Content-type= line. The other files do not need such a line since wndex can deduce from the file name extensions that file.txt has type text/plain and file2.html has type text/html. This is done once at the time index.cache is created and need not be done every time the file is served. By the way, if the sound file were named soundfile.au it wouldn't need a Content-type line either.

The title of a file is another example of information stored in the index.cache file. With the WN server every file served has a title (even binaries) and optionally has a list of keywords associated with it. For an HTML document the title and the keywords are automatically extracted by wndex from the header of the document and stored in fields of that file's line in index.cache. These are used for the built-in keyword and title searches which the server supports.

3.2 File Ownership and Permissions

The files which you wish to serve should be owned by you, or by their creator, or by whoever is in charge of maintaining them. They should not be owned by nobody or whatever user id the server runs under as set in config.h. This because the nobody id should have the minimum permissions possible. It needs to have read access to the files to be served, but it has no need to be able to write to those files or alter them in any way.

Thus normally the files served might be owned by the maintainer and have their permissions set to be world readable but writable only by the maintainer (or by no one).

Likewise the index.cache file which controls access to everything in a directory should be owned by the maintainer of that directory and the only permission nobody should have for this file is read permission. In fact, for security reasons it the server was started as root (and then switched to a safer user like nobody) wnd or wnsd will refuse to use any index.cache file which is owned by the user id (e.g. nobody) under which the server is running. This restriction does not apply if wnsd is run on an unprivileged port by an ordinary user, because such a user might not be able to make index.cache files owned by someone else.

There is one exception to the rule of having nothing owned by nobody (and that's not a double negative). The exception is the log files. These files must be writable by the server and it generally seems sensible to have them owned by the user nobody under whose identity the server runs. The log file and the error log file can be specified on the command line when the server is run or can be set in the config.h with the #define WN_LOGFILE and #define WN_ERRLOGFILE macros.

3.3 Using the wndex Utility

Before describing the index file in greater detail we briefly explain the use of the program which reads this file and produces the index.cache database file. Simply running wndex with no arguments in a directory containing a file named index causes that file to be read and a file called index.cache to be created in that directory.

There are several command line arguments for wndex. The -r option causes wndex to recursively descend your data hierarchy using all subdirectories listed in the Subdirs= line of the directory record in the index file (see below).

The -i and -c options specify an alternate name for the index file and the index.cache file respectively. For example the command:

wndex -i foo -c bar

will attempt to use foo as the index file and produce the file bar instead of index.cache.

The -d option specifies a directory other than the current directory in which to find the index file and in which to create the index.cache and index.html files.

Finally the -q option (for quiet) suppresses the printing of any warning or informational messages by wndex.

3.4 The Directory Record

The first group of lines in an index file provides information about the directory itself and the collection of files it contains rather than about any single file in the directory. It is called the directory record. This beginning collection of lines might look like:

Owner=mailto:you@host.edu
SearchWrapper=dir_search_wrap
Accessfile=/dir/access
Subdirs=dir1,dir2,directory3

The Owner= line specifies the owner of items in the directory (which is used in the HTTP/1.1 headers sent by the server).

The SearchWrapper= line specifies a "wrapper" for the various searches of the directory. That is an HTML document which provides a customized response listing the matching items in one of the various searches of the directory. For more details see the chapter "Parsed Text and Server Side Includes on the WN Server" in this guide.

The Accessfile= line specifies the name of the file which controls access (by IP address) to this directory. If this item is omitted then items in the directory may be served to anyone. For more information on using the access mechanism see the chapter "Limiting Access to Your WN Hierarchy" in this guide.

Finally the line starting with Subdirs= specifies the subdirectories of this directory which you wish to have recursively searched when a title or keyword search is done on this directory. More information about searching can be found in the chapter "Setting Up Searches on the WN Server" in this guide.

For a complete list of the possible lines (called "directives") which a directory can have see the section "Directory Directives" in this guide.

3.5 File Records

After the directory record line group an index file will typically have groups of lines called file records describing a particular file. A file record can be as simple as a single line like the line "File=file2.html" in the example above or it can contain several lines describing the file. For a complete list of the possible lines (called "directives") which a file can have see the section "File Directives" in this guide.

3.6 Your Default Page

When someone sends a request to your server with only the server name and no file name like:

http://hopf.math.nwu.edu/

the WN server automatically translates this to:

http://hopf.math.nwu.edu/index.html

adding the file name "index.html". More generally if a request is made for a directory, say with the URL http://host/dir1/dir2/, this will be translated to a request for http://host/dir1/dir2/index.html.

If you wish the default file name in a particular directory to be something other than "index.html" you can use the Default-Document= directive in the directory record of your index file to change it. If you wish to change the default file name for all directories on the server you can change the #define INDEXFILE_NAME line in the config.h file and recompile.

3.7 Serving Files not Listed in an index File

WN is also able to serve files without explicitly listing them in an index or index.cache file. This is done by putting the line:

Attributes=serveall

in the directory record of the index file for a directory or by running wndex with the -a option. Either of these specify that any file in this directory, which does not start with the character '.', or contain a '~', may be served, not just those listed in the index file. The files index and index.cache will also not be served. (Indeed if the -a option is used with wndex there need not even be an index file, because an index.cache file will be created just as if the Attributes=serveall directive had been used.)

Note: When this directive is used in a directory protected by an accessfile or a password file be sure that these files have names that start with '.', or contain a '~'. Or better, put these files in a different directory from which nothing is served.

When the Attributes=serveall directive is used the server will attempt to set the content type correctly based on the file name suffix using the same default correspondences between type and suffix that wndex uses. Indeed when wndex is run on a directory with the Attributes=serveall directive, it behaves as if all files in the directory (except those starting with '.' or containing a '~') were listed with a File= directive. If the Attributes=serveall line (and the corresponding entry it creates in the index.cache file) are not present then only the files explicitly listed with a File= directive will be served.

The default correspondences between file name suffixes and MIME types are specified in the "mime.types" file. A default version of the file is in /lib/mime.types. The mime.types file should be installed in a known location. The default location is in the WN src hierarchy, but this can be changed by specifying a different value when the configure program is run or by editing the value of #define MIME_TYPE_FILE in config.h. The mime.types file exists so that you can add to it if you wish to add new kinds of documents to your server. The format of the file is explained in the file. If this file cannot be opened then wndex will use compiled in defaults which are the same as what is currently in the default version of this file. The mime.types file is read whenever wndex is run so wndex always knows the latest additions. This file is also read by wnsd (but not wnd) on startup for use with directories with the Attributes=serveall directive. The wnsd stand-alone server reads this file when it is started or restarted, but only takes note of new suffixes and their MIME types. You cannot change the MIME type corresponding to one of the standard suffixes (as listed in the default mime.types file). To do that you need to change the server source and recompile.

It is fine to have file records in an index file which also has the Attributes=serveall directive. In this case the file directives take precedence. Thus if you had an index file consisting of:

Attributes=serveall

File=foo.html
Content-type=application/postscript

the server would consult the file record for "foo.html" first and see that it is of type application/postscript (it would be silly to actually do this, of course) and use that type. But another file "bar.html" in the directory would also be served with the type indicated by its suffix. Files with no file record in the index file and no recognized suffix will be given the default content type which can set with the Default-Content= directive.

When wndex is run on an index file with the Attributes=serveall directive all the files currently in that directory which can be served are given entries in the index.cache file. Title and keyword searches only see files listed in an index.cache file. Likewise context and grep searches only seek matches in files listed in the index.cache file. Thus if a file is added to a directory with the Attributes=serveall directive it will not be visible to searches unless wndex is re-run in that directory. If it has not been re-run the file will still be served, however. Still, it is good practice to re-run wndex every time you add or delete a file in a directory with the Attributes=serveall directive. (Of course, it is required to do this for a directory without the Attributes=serveall directive.) There is no need to re-run wndex if you only change an existing file, unless you change its title or keywords.

There is no way to use wrappers or includes for files not listed in the index file. So generally, the few seconds it takes to add a document's name and a descriptive title to your index file and then to run wndex will pay off.

If you do not wish the Attributes=serveall directive to be allowed on your server you can disable it by uncommenting the "#define NO_SERVEALL" line in the config.h file. This does not affect the ability of wndex to write index.cache entries for all files in a directory with the Attributes=serveall directive. But it means the server will only serve files listed an index.cache file.

3.8 Customized Error Messages

There are three situations when the client request will be denied but for which you can supply customized error messages. These are requests for non-existent files, requests for files which require a password but for which no valid password was given, and requests from an invalid host for files limited to certain hosts. The lines:

No-Such-File-URL=http://host/dir/nosuch.html
Access-denied-URL=http://host/dir/noaccess.html
Auth-denied-file=~/dir/nopassword.html

in a directory record of an index file specify URL's to which clients are redirected when a non-existent file is requested and when a document protected by an access control file is requested from an invalid host. The last line specifies a file to be sent when a password protected file is requested without a password or with an invalid password. For technical reasons it wouldn't work to have this be a redirection.

In the first two lines above (specifying redirection) the URL's given can be relative URL's, so the lines:

No-Such-File-URL=/dir/nosuch.html
Access-denied-URL=noaccess.html

are valid. Default values for these three directives may be specified by editing the config.h file and recompiling the server. More information on customized error messages can be found in section "Directory Directives" in this guide.


WN version 2.0.3
Copyright © 1998 John Franks <john@math.nwu.edu>
licensed under the OpenContent Public License
last-modified: Fri, 09 Oct 1998 18:18:09 GMT
[Previous] [Next] [Up] [Top] [Search] [Index]