index
file
In each directory of your data hierarchy you create a file called
index
with information about each file you want to serve.
The simplest index
file might contain the single line:
Attributes=serveall
which when properly processed will grant the server permission to serve
any file in the directory (but not in subdirectories). For more
information about this directive see the section on the serveall
attribute below. A more
elaborate index
file might look like the following:
Owner=mailto:webmistress@host.edu
File=file.txt
Title=This is a descriptive
title for file.txt
# This is a comment
File=file2.html
File=soundfile
Title=This plays some sounds
Content-type=audio/basic
The file contains four groups of lines called records. The first record
(the single line starting Owner=
in this example)
describes properties of the directory and is called the directory record. It can be empty, but in
general it is a good idea for the directory record to contain an owner
line, like the one above, referring to the maintainer of the directory.
The remainder of this index
file has three file records describing three files,
file.txt
, file2.html
and
soundfile
, in the directory which we wish to serve. The line
starting with '#
' is a comment. Wherever a '#
'
occurs the remainder of that line is treated as a comment (i.e. ignored).
The index
file is processed with the utility wndex
(pronounced "windex") to produce a
small database called index.cache
containing information
about this directory and its contents. Detailed information on the wndex
utility is given below, but simply
running it with no arguments in a directory containing an
index
file will produce the index.cache
file
for that directory. This file contains all the information in the
index
file plus additional information gathered
automatically about the files to be served. In particular the
index.cache
file will list the names of the files given in
the File=
lines of the
index
file. Any file on the server whose name is not listed
in an index.cache
file will not be served. This is the
basis of WN security. For security reasons the server will
refuse to use any index.cache
file which is in reality a
symbolic link to another file.
The index.cache
database has a number of other functions
beyond its security role. Attributes of the files listed in the
index
file which can be computed before they are served and
which don't often change are stored in the index.cache
file.
For example, the MIME
content type of soundfile
is read from the Content-type=
line.
The other files do not need such a line since wndex
can deduce from the file name
extensions that file.txt
has type text/plain
and file2.html
has type text/html
. This is
done once at the time index.cache
is created and need not be
done every time the file is served. By the way, if the sound file were
named soundfile.au
it wouldn't need a
Content-type
line either.
The title of a file is another example of information stored in the
index.cache
file. With the WN server every file
served has a title (even binaries) and optionally has a list of keywords
associated with it. For an HTML document the title and the keywords are
automatically extracted by wndex
from
the header of the document and stored in fields of that file's line in
index.cache
. These are used for the built-in keyword and
title searches which the server supports.
The files which you wish to serve should be owned by you, or by their
creator, or by whoever is in charge of maintaining them. They should not
be owned by nobody
or whatever user id the server runs under
as set in config.h
.
This because the nobody
id should have the minimum
permissions possible. It needs to have read access to the files to be
served, but it has no need to be able to write to those files or alter
them in any way.
Thus normally the files served might be owned by the maintainer and have their permissions set to be world readable but writable only by the maintainer (or by no one).
Likewise the index.cache
file which controls access to
everything in a directory should be owned by the maintainer of that
directory and the only permission nobody
should have for
this file is read permission. In fact, for security reasons it the
server was started as root
(and then switched to a safer
user like nobody
) wnd
or wnsd
will
refuse to use any index.cache
file which is owned by the
user id (e.g. nobody
) under which the server is running.
This restriction does not apply if wnsd
is run on an
unprivileged port by an ordinary user, because such a user might not be
able to make index.cache
files owned by someone else.
There is one exception to the rule of having nothing owned by
nobody
(and that's not a double negative). The exception is
the log files. These files must be writable by the server and it
generally seems sensible to have them owned by the user
nobody
under whose identity the server runs. The log file
and the error log file can be specified on the command line when the
server is run or can be set in the config.h
with the #define WN_LOGFILE
and #define WN_ERRLOGFILE
macros.
wndex
Utility
Before describing the index
file in greater detail we
briefly explain the use of the program which reads this file and produces
the index.cache
database file. Simply running
wndex
with no arguments in a directory containing a file
named index
causes that file to be read and a file called
index.cache
to be created in that directory.
There are several command line arguments for wndex
. The -r
option causes
wndex
to recursively descend your data hierarchy using all
subdirectories listed in the Subdirs=
line of the
directory record in the index
file (see below).
The -i
and -c
options specify an
alternate name for the index
file and the
index.cache
file respectively. For example the command:
wndex -i foo -c bar
will attempt to use foo
as the index
file and
produce the file bar
instead of index.cache
.
The -d
option specifies
a directory other than the current directory in which to find the
index
file and in which to create the
index.cache
and index.html
files.
Finally the -q
option
(for quiet) suppresses the printing of any warning or informational
messages by wndex
.
The first group of lines in an index
file provides
information about the directory itself and the collection of files it
contains rather than about any single file in the directory. It is
called the directory record. This beginning collection of lines might
look like:
Owner=mailto:you@host.edu
SearchWrapper=dir_search_wrap
Accessfile=/dir/access
Subdirs=dir1,dir2,directory3
The Owner=
line
specifies the owner of items in the directory (which is used in the HTTP/1.1 headers sent by the
server).
The SearchWrapper=
line
specifies a "wrapper" for the various searches of the directory. That is
an HTML document which provides a customized response listing the
matching items in one of the various searches of the directory. For more
details see the chapter "Parsed Text and Server Side
Includes on the WN Server" in this guide.
The Accessfile=
line specifies the name of the file which controls access (by IP address)
to this directory. If this item is omitted then items in the directory
may be served to anyone. For more information on using the access
mechanism see the chapter "Limiting Access to Your
WN Hierarchy" in this guide.
Finally the line starting with Subdirs=
specifies the
subdirectories of this directory which you wish to have recursively
searched when a title or keyword search is done on this directory. More
information about searching can be found in the chapter "Setting Up Searches on the WN Server" in
this guide.
For a complete list of the possible lines (called "directives") which a directory can have see the section "Directory Directives" in this guide.
After the directory record line group an index
file will
typically have groups of lines called file records describing a
particular file. A file record can be as simple as a single line like
the line "File=file2.html
" in the
example above or it can contain several lines describing the file. For a
complete list of the possible lines (called "directives") which a file
can have see the section "File Directives"
in this guide.
When someone sends a request to your server with only the server name and no file name like:
http://hopf.math.nwu.edu/
the WN server automatically translates this to:
http://hopf.math.nwu.edu/index.html
adding the file name "index.html
". More generally if a
request is made for a directory, say with the URL
http://host/dir1/dir2/
, this will be translated to a request
for http://host/dir1/dir2/index.html
.
If you wish the default file name in a particular directory to be
something other than "index.html
" you can use the Default-Document=
directive in the directory record of your index
file to
change it. If you wish to change the default file name for all
directories on the server you can change the #define INDEXFILE_NAME
line in the config.h
file
and recompile.
index
File
WN is also able to serve files without explicitly listing them
in an index
or index.cache
file. This is done
by putting the line:
Attributes=serveall
in the directory record of the index
file for a directory or
by running wndex
with the -a
option. Either of these
specify that any file in this directory, which does not start with the
character '.
', or contain a '~
', may be served,
not just those listed in the index
file. The files
index
and index.cache
will also not be served.
(Indeed if the -a
option
is used with wndex
there need not even
be an index
file, because an index.cache
file
will be created just as if the Attributes=serveall
directive had been used.)
Note: When this directive is used in a directory protected by anaccessfile
or a password file be sure that these files have names that start with '.
', or contain a '~
'. Or better, put these files in a different directory from which nothing is served.
When the Attributes=serveall
directive is used the server will attempt to set the content type
correctly based on the file name suffix using the same default
correspondences between type and suffix that wndex
uses. Indeed when wndex
is run on a directory with the
Attributes=serveall
directive, it behaves as if all files in the directory (except those
starting with '.
' or containing a '~
') were
listed with a File=
directive. If the Attributes=serveall
line (and the corresponding entry it creates in the
index.cache
file) are not present then only the files
explicitly listed with a File=
directive will be
served.
The default correspondences between file name suffixes and MIME types are
specified in the "mime.types
" file. A default version of
the file is in /lib/mime.types
. The mime.types
file should be installed in a known location. The default location is in
the WN src
hierarchy, but this can be changed by
specifying a different value when the configure
program
is run or by editing the value of #define MIME_TYPE_FILE
in config.h
. The
mime.types
file exists so that you can add to it if you wish
to add new kinds of documents to your server. The format of the file is
explained in the file. If this file cannot be opened then wndex
will use compiled in defaults which
are the same as what is currently in the default version of this file.
The mime.types
file is read whenever wndex
is run so wndex
always knows the latest additions.
This file is also read by wnsd
(but not wnd
) on
startup for use with directories with the Attributes=serveall
directive. The wnsd
stand-alone server reads this file when
it is started or restarted, but only takes note of new suffixes and their
MIME types. You cannot change the MIME type corresponding to one of the
standard suffixes (as listed in the default mime.types
file). To do that you need to change the server source and recompile.
It is fine to have file records in an index
file which also
has the Attributes=serveall
directive. In this case the file directives take precedence. Thus if
you had an index
file consisting of:
Attributes=serveall
File=foo.html
Content-type=application/postscript
the server would consult the file record for "foo.html
"
first and see that it is of type application/postscript
(it
would be silly to actually do this, of course) and use that type. But
another file "bar.html
" in the directory would also be
served with the type indicated by its suffix. Files with no file record
in the index
file and no recognized suffix will be given the
default content type which can set with the Default-Content=
directive.
When wndex
is run on an
index
file with the Attributes=serveall
directive all the files currently in that directory which can be served
are given entries in the index.cache
file. Title and keyword searches only see files listed in
an index.cache
file. Likewise context and grep searches only seek matches in files
listed in the index.cache
file. Thus if a file is added to
a directory with the Attributes=serveall
directive it will not be visible to searches unless wndex
is re-run in that directory. If it
has not been re-run the file will still be served, however. Still, it is
good practice to re-run wndex
every
time you add or delete a file in a directory with the Attributes=serveall
directive. (Of course, it is required to do this for a directory without
the Attributes=serveall
directive.) There is no need to re-run wndex
if you only change an existing file,
unless you change its title or keywords.
There is no way to use wrappers or includes for
files not listed in the index
file. So generally, the few
seconds it takes to add a document's name and a descriptive title to your
index
file and then to run wndex
will pay off.
If you do not wish the Attributes=serveall
directive to be allowed on your server you can disable it by uncommenting
the "#define NO_SERVEALL
"
line in the config.h
file.
This does not affect the ability of wndex
to write index.cache
entries for all files in a directory with the Attributes=serveall
directive. But it means the server will only serve files listed an
index.cache
file.
There are three situations when the client request will be denied but for which you can supply customized error messages. These are requests for non-existent files, requests for files which require a password but for which no valid password was given, and requests from an invalid host for files limited to certain hosts. The lines:
No-Such-File-URL=http://host/dir/nosuch.html
Access-denied-URL=http://host/dir/noaccess.html
Auth-denied-file=~/dir/nopassword.html
in a directory record of an index
file specify URL's to which clients are redirected when a non-existent
file is requested and when a document protected by an access control file is requested from an
invalid host. The last line specifies a file to be sent when a password
protected file is requested without a password or with an invalid
password. For technical reasons it wouldn't work to have this be a
redirection.
In the first two lines above (specifying redirection) the URL's given can be relative URL's, so the lines:
No-Such-File-URL=/dir/nosuch.html
Access-denied-URL=noaccess.html
are valid. Default values for these three directives may be specified by
editing the config.h
file
and recompiling the server. More information on customized error
messages can be found in section "Directory
Directives" in this guide.