Bob's Blinking Search Engine
by Bob Crispen
crispen@hiwaay.net
27 July 1997

INTRODUCTION.

This set of programs allows you to index all or part of your
website, and to search for words in the index.  The search
results are displayed as a webpage with links.  Clicking on
a link will display the webpage exactly like your original
webpage, but with the search words highlighted.  Highlighting
consists of making the words red and blinking (yuck).  If
your website has lots of those disgusting <blink> tags or a
lot of red text, you may not find these programs that useful.

These programs are placed in the public domain.  They are free
of license fees, except as noted in Step Five below.  However,
they require you to have two other programs, uncgi (which you
can get at http://www.midwinter.com/~koreth/uncgi.html) and
swish (Simple Web Indexing System for Humans, which you can
get at http://www.eit.com/software/swish/swish.html).  Those
programs are generally freeware, but please check their
licenses to make sure they're something you can live with.


CONTENTS.

	Readme:			this file
	site_search.html	a typical webpage to control the search
	site_search		a script that does the search
	get_link		a script that generates a page from a link
	build_page.c		C source to generate the results page
	highlight.c		C source to highlight search words


INSTALLATION.

Step One: Install Swish and use it to index your site.  The
instructions in the HTML file that comes with Swish are fairly
self-explanatory.  Make a .conf file, using the sample, and generate
your own .swish file or files (depending on how many indexes you
want).  Use the command line interface to make sure swish has indexed
all the files you want.

Step Two: Install uncgi and create an uncgi-bin directory in your
hierarchy.  Try the example in the documentation to make sure it's
working.  You may need to name the executable uncgi.cgi and refer to it
that way in your web page if your site requires CGI files to be named
xxx.cgi.

Note that my site requires CGI files to be run on port 8000, which is
why this is in the URLs of several programs and scripts.  If your site
does not, remove the ":8000" everywhere in the source files and
scripts.

Step Three: Copy highlight.c and build_page.c to some directory in your
web hierarchy (cgi-bin is a common place).  Modify them so that they
point to your files and compile them.

Step Four: Copy site_search.html, site_search, and get_link to your
uncgi-bin directory.  Modify these files so that they point to your
.swish file, your swish program, your highlight, and your build_page.
site_search and get_link should be executable.

Step Five: Try it out.  If you have problems, I am available for
consultation at $100 per hour (which means I do *not* want to be
bothered).


KNOWN BUGS.

(1) The logic of the programs is such that if there are two search
strings (e.g., "boy" and "girl") and the *first* appears in the line
*twice*, followed by the *second*, only *one* copy of the first string
will be highlighted.  Note that the first might be found inside a tag
the first time, and therefore won't be highlighted, so that only the
second string may show up highlighted on the line.  Since I can't think
of an occasion when *something* won't show up in that line with
highlighting, I decided to leave it that way.

(2) Some strings (e.g., "VRML") don't get into the index.  This is a
bug in SWISH.

(3) If you have one or more newlines between <title> and </title>,
SWISH produces some extra junk in its title for the page.  This is a
bug either in SWISH or on your webpage.

(4) Files that have no </head> or <body> tag will show errors, both on
included frames and on images, and won't have any search words
highlighted.  You should make sure your webpages have at least one of
these tags, as I have no intention of correcting this.

(5) Pages that have frames will not show the highlighting on the
included frame.  The workaround, since I have no idea how to fix it, is
to put a distinctive string inside your <title> container in your
frames page (e.g., "frames") and then set your SWISH .conf file so that
it doesn't index those pages.

(6) The programs find and highlight search words inside the <noframes>
section of frames pages, so that newer browsers won't display them.
The workaround is the same.

(7) This isn't a bug, but you'll quickly notice if you haven't used a
search engine before on your site that you'll need to have descriptive
titles for all your pages.

(8) Sometimes the blinking label, particularly when it's on a word
which Netscape highlights (e.g., links), will blink, but it won't blink
red after the first time.  I don't know why this is, and suspect it's a
"feature" of Netscape which overrides the <font> tag color
specification with its own color.

(9) Since the number of MIME types is very large and my patience
is very small, you may need to edit highlight.c to add the MIME
types for some of your favorites.

Bob Crispen

