Using CGI Programs on the WN Server

Version 2.0.3
[Previous] [Next] [Up] [Top] [Search] [Index]

Using CGI Programs on the WN Server

CGI stands for Common Gateway Interface. It provides a standard for Web servers to interact with programs which are not part of the server but may produce output which you wish to serve.

16.1 Do You Need a CGI Program?

Many functions which are done by CGI programs on other servers are built in features of WN. If your needs can be met by these features then not only will you save yourself considerable effort in creating, setting up, and maintaining programs, but the built in feature will perform much more efficiently and much more securely than a CGI program.

These features include the ability to respond with different text or entirely different documents based on the the client request, the client's hostname, IP address, user-agent, or the "referer", the document containing the link. For information about this see the chapter "Parsed Text and Server Side Includes on the WN Server" in this guide. Also support for "imagemaps" or clickable images is built in so there is no need to use CGI for this. See the chapter "Clickable Images and Imagemap files on the WN Server" in this guide. Finally WN supports a variety of methods of searching your data including by title, keyword, or full text. See the chapter "Setting Up Searches on the WN Server" in this guide.

If these features do not meet your needs and something like a CGI program will, then you may wish to consider using a WN filter. These have most of the functionality of CGI programs, but are somewhat more secure and have one advantage: the output of filters can be parsed while CGI output cannot.

16.2 How Does the Server Recognize a CGI Program?

It would be nice if one could simply indicate in the appropriate index file that a particular file is a CGI program which should be executed rather than served. Unfortunately, the CGI protocol makes it impossible to implement this in an efficient way.

There are two mechanisms in fairly common use with other servers for indicating that a file is a CGI program and WN supports them both. The first is to give the file name a special extension (by default it is ".cgi") which indicates that it is a CGI program. Thus any file you serve with the name "something.cgi" will be treated as a CGI program. The special extension ".cgi" can be changed by redefining the macro "#define CGI_EXT" by editing the file config.h and recompiling servers.

The second mechanism is to have specially named directories with the property that any file in that directory will be assumed to be a CGI program. The default for this special name is "cgi-bin". Thus, if you have a directory /cgi-bin in your hierarchy the server will assume that any file served from that directory is a CGI program. Of course, as always, only files listed in that directory's index file will be servable. No files in subdirectories of /cgi-bin can be served. This is because the server will alway interpret a request for "/cgi-bin/foo/bar" as meaning run the program "/cgi-bin/foo" with the PATH_INFO CGI environment variable set to "bar". Thus if "foo" is actually a directory and "bar" a file in it, the request will fail.

There is no need for /cgi-bin to be at the top of your hierarchy. It could be anywhere in the hierarchy. And, in fact, you can have as many directories named "cgi-bin" as you like. They will all be treated the same. The special name "cgi-bin" can be changed by redefining the macro "#define CGI_BIN" by editing the file config.h and recompiling servers.

16.3 How Does a CGI Program Work?

It is beyond the scope of this document to provide an extensive tutorial in writing CGI programs. There is an online tutorial at WDVL.internet.com and another available from NCSA. A collection of links to CGI information is available at www.stars.com.

We will provide only a simple example of a CGI program written in perl. More examples can be found in the /docs/examples directory of the WN distribution.

#!/usr/local/bin/perl # Simple example of CGI program. print "Content-type: text/html\r\n"; # The first line must specify content type. Other # optional headers might go here. print "\r\n"; # A blank line ends the headers. All header lines should # end with CRLF ("\r\n"), but other lines don't need to. # From now on everything goes to the client print "<body>\n"; print "<h2>A few CGI environment variables:</h2>\n\n"; print "REMOTE_HOST = $ENV{REMOTE_HOST}<br>\n"; print "HTTP_REFERER = $ENV{HTTP_REFERER}<br>\n"; print "HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT}<br>\n"; print "QUERY_STRING = $ENV{QUERY_STRING}<br>\n"; print "<p>\n"; print "</body>\n";

Notice that the first thing the program does is provide the HTTP/1.1 "Content-type:" header line. It may be followed by other optional headers you want the server to send. The end of these headers is indicated by a blank line. Of course the server will add additional headers.

By default the WN server assumes that the output of any CGI program is "dynamic" or different each time the program is run and is also "non-cachable". Hence the server behaves as if the "Attributes=dynamic,non-cachable" directive had been used. The "Attributes=dynamic" causes the server not to send a last modified date or a content length since they might be constantly changing. The "Attributes=non-cachable" attempts to dissuade clients and proxies from caching the output by sending an appropriate HTTP header.

If, in fact, the output of your program is always the same, you can use the "Attributes=nondynamic" directive. Also if you wish it to be cached you must use the "Attributes=cachable" directive. In particular, if you want the browser "back" button to return users to a a CGI generated page after they have followed a link you may need "Attributes=cachable" (especially with an HTML "<form action="post">") since otherwise the browser may not even cache the page in memory.

The program above is a good example of one which should not be cached as it prints out the client's hostname, user agent and the URL of the document which contains the link to this CGI program. The CGI program gets this information about the client from environmental variables set by the server. A complete list of the standard CGI environment variables and a description of what they contain plus a description of some additional non-standard ones supplied by the WN server can be found in the appendix "CGI and Other Environment Variables on the WN server" in this guide.

In addition to setting these environment variables appropriately the server will change the current working directory of the CGI process to the directory in which the CGI program is located.

Note: In general a CGI program has complete control over its output, so it is responsible for doing things which the server might do for a static document. This means that you cannot use many of the WN features with CGI output. In particular the server will not use a filter or parse it for "", etc. The CGI program must do these things for itself. Also the server will not provide ranges specified in the "Range:" header. Instead the contents of this header is passed to the program in the environment variable HTTP_RANGE, so the program can do the range processing.

One thing you should be aware of in writing programs is that the WN server does not send the UNIX stderr(3) stream to the error log file, but leaves its default the terminal from which the server is invoked. This allows the maintainer to set it to a file of her choice or leave it directed to the console window in which wnsd was invoked. To redirect it to a file called "my.errs" simply run wnsd with a command like:

wnsd <options> 2>my.errs

if you are using a UNIX sh(1) Borne-like shell. This can be useful when debugging CGI programs because their errors are typically sent to the UNIX stderr(3) stream so you can easily view them with the UNIX tail(1) utility like:

tail -f my.errs

rather than have them buried in a log file.

16.4 CGI Handlers

Sometimes you may have a number of files which are to be processed by the same CGI program or program. In that case you might consider designating a "handler" for these files instead of putting the the name of the CGI program in the URL for each of them.

The file directive:

CGI-Handler=bar.cgi

causes the program "bar.cgi" to be run and its output to be served in place of the document requested. This is a way to designate a CGI program to handle a file somewhat like a filter. The name of the program need not be in the URL since it is in the index file. So when http://host/foo.html is requested this will cause the handler, bar.cgi, to be run with the CGI environment variable PATH_INFO set to /path2/foo.html. In normal use the program bar.cgi will do something to the file foo.html and serve the output. It is useful if you want a number of files in a directory to be handled by the same CGI program. Note the file foo.html need not be used in any way by the program, but it must exist or else the server will treat it as a non-existent file.

The directory directive "Default-CGI-Handler=handler.cgi" specifies that all files in the directory should be treated as if the "CGI-Handler=" file directive had been set to handler.cgi. To override this setting and specify no CGI handler use the "CGI-Handler=<none>" directive.

16.5 How Can CGI Programs be Made Safe?

This is an extremely important issue, but one which is beyond the scope of this document. I highly recommend the Safe CGI Programming maintained by Paul Phillips and the WWW Security FAQ maintained by Lincoln Stein.

[Previous] [Next] [Up] [Top] [Search] [Index]