CGI stands for Common Gateway Interface. It provides a standard for Web servers to interact with programs which are not part of the server but may produce output which you wish to serve.
Many functions which are done by CGI programs on other servers are built in features of WN. If your needs can be met by these features then not only will you save yourself considerable effort in creating, setting up, and maintaining programs, but the built in feature will perform much more efficiently and much more securely than a CGI program.
These features include the ability to respond with different text or entirely different documents based on the the client request, the client's hostname, IP address, user-agent, or the "referer", the document containing the link. For information about this see the chapter "Parsed Text and Server Side Includes on the WN Server" in this guide. Also support for "imagemaps" or clickable images is built in so there is no need to use CGI for this. See the chapter "Clickable Images and Imagemap files on the WN Server" in this guide. Finally WN supports a variety of methods of searching your data including by title, keyword, or full text. See the chapter "Setting Up Searches on the WN Server" in this guide.
If these features do not meet your needs and something like a CGI program will, then you may wish to consider using a WN filter. These have most of the functionality of CGI programs, but are somewhat more secure and have one advantage: the output of filters can be parsed while CGI output cannot.
It would be nice if one could simply indicate in the appropriate index
file that a
particular file is a CGI program which should be executed rather than
served. Unfortunately, the CGI protocol makes it impossible to implement
this in an efficient way.
There are two mechanisms in fairly common use with other servers for
indicating that a file is a CGI program and WN supports them
both. The first is to give the file name a special extension (by default
it is ".cgi
") which indicates that it is a CGI program.
Thus any file you serve with the name "something.cgi
" will
be treated as a CGI program. The special extension ".cgi
"
can be changed by redefining the macro "#define CGI_EXT
" by
editing the file config.h
and recompiling servers.
The second mechanism is to have specially named directories with the
property that any file in that directory will be assumed to be a CGI
program. The default for this special name is "cgi-bin
".
Thus, if you have a directory /cgi-bin
in your hierarchy the
server will assume that any file served from that directory is a CGI
program. Of course, as always, only files listed in that directory's index
file will be
servable. No files in subdirectories of /cgi-bin
can be
served. This is because the server will alway interpret a request for
"/cgi-bin/foo/bar
" as meaning run the program
"/cgi-bin/foo
" with the PATH_INFO
CGI
environment variable set to "bar
". Thus if
"foo
" is actually a directory and "bar
" a file
in it, the request will fail.
There is no need for /cgi-bin
to be at the top of your
hierarchy. It could be anywhere in the hierarchy. And, in fact, you can
have as many directories named "cgi-bin
" as you like. They
will all be treated the same. The special name "cgi-bin
"
can be changed by redefining the macro "#define CGI_BIN
"
by editing the file config.h
and recompiling servers.
It is beyond the scope of this document to provide an extensive tutorial in writing CGI programs. There is an online tutorial at WDVL.internet.com and another available from NCSA. A collection of links to CGI information is available at www.stars.com.
We will provide only a simple example of a CGI program written in perl. More examples can be found in the
/docs/examples
directory of the WN distribution.
#!/usr/local/bin/perl
# Simple example of CGI program.
print "Content-type: text/html\r\n";
# The first line must specify content type. Other
# optional headers might go here.
print "\r\n";
# A blank line ends the headers. All header lines should
# end with CRLF ("\r\n"), but other lines don't need to.
# From now on everything goes to the client
print "<body>\n";
print "<h2>A few CGI environment variables:</h2>\n\n";
print "REMOTE_HOST = $ENV{REMOTE_HOST}<br>\n";
print "HTTP_REFERER = $ENV{HTTP_REFERER}<br>\n";
print "HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT}<br>\n";
print "QUERY_STRING = $ENV{QUERY_STRING}<br>\n";
print "<p>\n";
print "</body>\n";
Notice that the first thing the program does is provide the HTTP/1.1
"Content-type:
" header line. It may be followed by other
optional headers you want the server to send. The end of these headers
is indicated by a blank line. Of course the server will add additional
headers.
By default the WN server assumes that the output of any CGI
program is "dynamic" or different each time the program is run and is
also "non-cachable". Hence the server behaves as if the "Attributes=dynamic,non-cachable
"
directive had been used. The "Attributes=dynamic
"
causes the server not to send a last modified date or a content length
since they might be constantly changing. The "Attributes=non-cachable
"
attempts to dissuade clients and proxies from caching the output by
sending an appropriate HTTP header.
If, in fact, the output of your program is always the same, you can use
the "Attributes=nondynamic
"
directive. Also if you wish it to be cached you must use the "Attributes=cachable
"
directive. In particular, if you want the browser "back" button to
return users to a a CGI generated page after they have followed a link
you may need "Attributes=cachable
"
(especially with an HTML "<form action="post">
")
since otherwise the browser may not even cache the page in memory.
The program above is a good example of one which should not be cached as it prints out the client's hostname, user agent and the URL of the document which contains the link to this CGI program. The CGI program gets this information about the client from environmental variables set by the server. A complete list of the standard CGI environment variables and a description of what they contain plus a description of some additional non-standard ones supplied by the WN server can be found in the appendix "CGI and Other Environment Variables on the WN server" in this guide.
In addition to setting these environment variables appropriately the server will change the current working directory of the CGI process to the directory in which the CGI program is located.
Note: In general a CGI program has complete control over its output, so it is responsible for doing things which the server might do for a static document. This means that you cannot use many of the WN features with CGI output. In particular the server will not use a filter or parse it for "<!-- #include -->
", etc. The CGI program must do these things for itself. Also the server will not provide ranges specified in the "Range:
" header. Instead the contents of this header is passed to the program in the environment variableHTTP_RANGE
, so the program can do the range processing.
One thing you should be aware of in writing programs is that the
WN server does not send the UNIX stderr(3)
stream
to the error log file, but leaves its
default the terminal from which the server is invoked. This allows the
maintainer to set it to a file of her choice or leave it directed to the
console window in which wnsd
was invoked. To redirect it to
a file called "my.errs
" simply run wnsd
with a
command like:
wnsd <options> 2>my.errs
if you are using a UNIX sh(1) Borne-like
shell. This can be useful when debugging CGI programs because their
errors are typically sent to the UNIX stderr(3)
stream so
you can easily view them with the UNIX tail(1)
utility like:
tail -f my.errs
rather than have them buried in a log file.
Sometimes you may have a number of files which are to be processed by the same CGI program or program. In that case you might consider designating a "handler" for these files instead of putting the the name of the CGI program in the URL for each of them.
The file directive:
CGI-Handler=bar.cgi
causes the program "bar.cgi
" to be run and its output to be
served in place of the document requested. This is a way to designate a
CGI program to handle a file somewhat like a filter. The name of the
program need not be in the URL since it is in the index
file. So when
http://host/foo.html
is requested this will cause the
handler, bar.cgi
, to be run with the CGI environment
variable PATH_INFO
set to
/path2/foo.html
. In normal use the program
bar.cgi
will do something to the file foo.html
and serve the output. It is useful if you want a number of files in a
directory to be handled by the same CGI program. Note the file
foo.html
need not be used in any way by the program, but it
must exist or else the server will treat it as a non-existent file.
The directory directive "Default-CGI-Handler=handler.cgi
"
specifies that all files in the directory should be treated as if the
"CGI-Handler=
"
file directive had been set to handler.cgi
. To override
this setting and specify no CGI handler use the "CGI-Handler=<none>
"
directive.
This is an extremely important issue, but one which is beyond the scope of this document. I highly recommend the Safe CGI Programming maintained by Paul Phillips and the WWW Security FAQ maintained by Lincoln Stein.