Find 
Home of the HomepageSearchEngine Umschalten auf Deutsch 
menu

FAQ (Frequently Asked Questions)

This document should help to solve your problems after you have read the

Text File Manual (ReadMe)

[ 1. Requirements | 2. Installation and Configuration | 3. Functionality | 4. Security ]

 1. Requirements
1.1 Which requirements are there for running the search engine?
1.2 I work on Windows/MacOS but my site runs on Unix
 2. Installation and Configuration
2.1 Can I install the search engine on my website by myself?
2.2 How do I call the search engine?
2.3 How can I configure the search engine?
2.4 I want different categories of my site to be searched
2.5 How do I fit the search engine to my existing webdesign?
2.6 Internationalization ("I18n"): Changing of language settings
2.7 How do I apply the flat or indexed search method?
2.8 How can I change the outfit of the dynamically created HTML?
2.9 I want to start the search engine out from my own input field
2.10 I want to link to the search engine with changed default settings
 3. Functionality
3.1 In which directories of my website takes the search place?
3.2 Which files of my website will be searched?
3.3 It it possible to search Word documents?
3.4 How are the index files organized?
3.5 Which file details will be printed in the result pages?
3.6 How will the results be sorted?
3.7 How fast runs the search and where is the limit?
3.8 How do I maximize the search speed in a large site?
 4. Security
4.1 Is HomepageSearchEngine a secure program?
4.2 What about privacy?
4.3 How does the secure data transfer work?

1. Requirements

1.1 Which requirements are there for running the search engine?

Webspace on a Windows- or Unix- (incl. MacOS X) platform with the right to run custom CGI programs. In most cases, the latter is true if there exists a cgi-bin directory. You don't need any additional things on the server - neither Perl, nor a database- or any other application. On the client, only a webbrowser is needed to use and administer the search engine. Cookies are not required or used. If you have a small, static website, you don't have to do anything once the search engine has been installed. The "on-the-fly" search method does the entire work each time when being used by a user. It is sufficient for the webmaster to access her webspace via FTP.

But you can also decide to apply the "flat" or the "indexed" search method * which does not require to collect the file-list and/or extract all files each time. This may be required for a large website to keep the search time small. In that case shell access (via Telnet/SSH) is recommended; for automated indexing also the right to let a custom cronjob run. If your ISP only grants you FTP access, the webserver should run as the same user ID as your account.

*) These search methods are only available in the Pro edition.

1.2 I work on Windows/MacOS but my site runs on Unix

No problem - download the matching Unix package and install it via FTP on the Unix target machine.

If you want to test the search engine on your local hard disk under Windows/MacOS, you also need the package for Windows or MacOS, respectively, and, of course, a Webserver software (eg. Apache). The packages only differ in the executable file and its associated libraries (shared objects). All other files including the index files can be used on all platforms.

2. Installation and Configuration

2.1 Can I install the search engine on my website by myself?

Yes, the basic installation is very easy: Make a installation directory hse in the directory cgi-bin or the directory where you have the right to execute programs, respectively, and upload the (executable) file HomepageSearchEngine.exe (for Windows) or HomepageSearchEngine.cgi.bin (for Unix), respectively, with your FTP-client into it. The extension "bin" should make sure that the file will be uploaded in binary mode. Under Unix, rename the uploaded file afterwards into HomepageSearchEngine.cgi and give it the attributes rwx r-x r-x (chmod 755). Finally, upload the library files (.dll and/or .so files). Detailed installation instructions can be found in the manual (ReadMe).

If there is Microsoft IIS (under Windows) used as webserver software, you should take a look into our IIS support page. Anyway, if you want, we can do the installation for you, for free.

2.2 How do I call the search engine?

You only need to point your webbrowser to the URL of the file HomepageSearchEngine.exe or HomepageSearchEngine.cgi, respectively. So, call the URL
http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe (Windows) or
http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.cgi (Unix), respectively.
www.yourdomain.tld has to be replaced by your real domain name, of course. After the step described above you should see a message that directs you to upload the configuration file. You do not *need* to call the CGI application from a form on a separate page since it generates such one itself.

2.3 How can I configure the search engine?

This directory is linked only to be visible by robots. The configuration of the search engine is done in the file hse.ini by editing values for directives. For easier referencing, each directive is marked with a number. You only need to set the values of 2 directives to run the search engine, all others are optional. These 2 directives define (1.1) the path (basepath) and (1.2) the corresponding absolute URL (baseurl) of the base directory, where the search of files should start from.

The path specification can be absolutely (eg. basepath = E:\Inetpub\wwwroot\startdir on Windows or /web/myuserdir/wwwroot/startdir on Unix), or relatively (eg. ../../startdir) so your hse.ini file on your Windows development server can be fully compatible to your Unix production server.

The corresponding URL would be eg. baseurl = http://www.yourdomain.tld/startdir. To enable maximum compatibility between different servers, you may also specify baseurl = /startdir instead. Then, the full URL will be constructed using your server's ServerName variable.

Open the configuration file with a text-editor (eg. Notepad) and make your individual settings. This file is self-explaining. Finally, upload the configured file into the installation directory.

You can use up to 9 additional (different) configuration files with one installation. For this reason, upload the conf (configuration) sub directory into your installation directory and place the additional configuration files into each configuration directory ("1" to "9") residing there.

2.4 I want different categories of my site to be searched

If you set (7.1) categories_nr = none in the configuration file, there is *no* possibility to search different categories. Then the search will always be performed under the basepath directory.

But, if you set categories_nr to a number from 1 to 99, a drop down menu with this number of possible selections appears in the pre-built input form. The name of the first possible selection (category 1) is set in (7.2) categories_name1, its starting directory in (7.3) categories_dir1 and its search source in (7.4) categories_source1. The corresponding directives for category 2 are categories_name2 etc.

Files and directories listed in the ban_list and in search_always will also be taken into account for the categories. Check the setup for each category by searching for list:files (also see section 3.2).

Note: This feature is not available in the Free edition.

2.5 How do I fit the search engine to my existing webdesign?

For this reason, upload the (static) HTML template file hse_template.html found in the program package into the installation directory (or into another configuration directory). You will see that the default upper and lower part of the search page changes. By editing this file you can implement your personal design to the search page. If you want to make links relatively, be sure to make them all relatively to the executable file, even if the template file resides in one of the optional configuration directories. But, best would be to make all links absolutely, beginning with "/" (the web root directory).

Alternatively, you may also use a dynamic HTML template * so you are able to take advantage of server-sided script languages such as SSI, PHP and so on. For details please refer to chapter 6.7 of the Manual.

See section 2.8 on how to customize the HTML code dynamically created by HSE, including the input form and the look of the results.

*) Dynamic HTML templates are not available in the Free edition.

2.6 Internationalization ("I18n"): Changing of language settings

By default, the program's output language is English. But, if you upload the language files hse_lang.txt and hse_help.txt into the executable's directory, the text of language files will be used as *default* language output. In the lang (language) sub directory of the program package there are ready language files for several languages. The files for each language are stored in an own language directory named by its ISO 639 language code. The language files for Spanish, e.g. are located in the language directory es. Even if you change the default language by uploading your preferred language files, you should upload the entire lang sub directory. It is then possible to easily change between all available languages including the associated international settings (3.1) encoding, (3.2) date_format, (3.3) decimal_sep and (3.4) dir by calling the CGI application with the delivery parameter lang:

Either as form (see section 2.9) with the entry
   <input name="lang"    type="hidden" value="es">
or as direct link (see section 2.10) with the entry
         /cgi-bin/hse/HomepageSearchEngine.exe?lang=es
You can also edit the language files by yourself to add or modify a language. Please inform us when you create working language files for a yet unsupported language. You will then get a licensed Pro edition of our search engine for free. hse_lang.txt contains the core translation, while hse_help.txt holds the content of the help window. The size of the resulting help window may be justified by using the directives (5.3) helpwindow_width and helpwindow_height.

2.7 How do I apply the flat or indexed search method *?

On large websites, the on-the-fly search method may be too slow. Then, you should apply the (faster) flat (file-list based) search method or the (fastest) indexed search method.

The indexed search method will always be applied if the index file pair for the actual category exists. The index file pair consists of a file called hse_indexNR_html.txt which holds the content of all HTML files and of a file called hse_indexNR_nonhtml.txt which holds the content of all Non-HTML files. NR represents the number of the actual category.

The flat search method will be applied if the file-list file pair for the actual category exists, but the index file pair does not exist. The file-list file pair consists of a file called hse_listNR_html.txt which holds the file pathes to all HTML files and of a file called hse_listNR_nonhtml.txt which holds the file pathes to all Non-HTML files.

The file-list and index file pair will be created by HomepageSearchEngine Shell Executable, either directly on the system shell or via the web based Admin Area.

Execute HomepageSearchEngine Shell Executable by entering 'HomepageSearchEngine' ENTER (on Windows) or './HomepageSearchEngine.cgi' ENTER (on Unix) to bring up this screen:

HomepageSearchEngine 3.62 Pro Shell Executable (c) 2006 ANET.at

Help is available by executing 'HomepageSearchEngine -help'.


The latter (entering 'HomepageSearchEngine -help') will show you all available commands and options, including the commands to make the file-lists and create the index files:

HomepageSearchEngine makelist
HomepageSearchEngine index

Power users will prefer the index file pairs to be created automatically every day in batch mode. This could be done on Unix as a cron job calling the shipped Shell script hse_cronjob.sh. Under Windows use the shipped Batch script hse_cronjob.bat instead.

A less powerful, but more user friendly alternative is Indexing via the Admin Area. This allows, for instance an easy "one-click Indexing", also on large sites (by applying the incremental indexing method).

Instead of creating the index files directly on the production server, you can also create them on your local hard drive where you have mirrored the site, regardless of the platform. Just be sure to use the correct executable on your development platform. No webserver is required to be installed. Finally upload the index files via FTP onto the production server.

*) These search methods are only available in the Pro edition.

2.8 How can I change the outfit of the dynamically created HTML?

Most style properties are controlled by an external Style Sheet file, which resides as /hse/HomepageSearchEngine.css per default. Ensure to have uploaded it into this location and that the HTML template file points to it! The format of most elements can be influenced via special class IDs in the Style Sheet.

If you want to call the search engine from a self-made input form you may also want to suppress the pre-built input form. You can do so by setting (5.1) searchbox_place = none in the hse.ini file. Additional customizing of the results pages' outfit can be done in section 6 of the hse.ini file.

2.9 I want to start the search engine out from my own input field

If you don't want to start the search from the pre-built input form, but from a self-made form (eg. directly from a little input field - as you can see in these pages' menu on the left) - use delivery parameters within HTML code like the following:
<form action="/cgi-bin/hse/HomepageSearchEngine.exe">
   <input name="terms"     type="text"   size="15">
   <input name="cat"       type="hidden" value="1">
   <input name="submit"    type="hidden">
</form>
This will search the titles and the full text in category 1. Of course, you can also specify another category. If the input form exists on a page belonging to category 2, it may be a good idea to change the cat value to 2. If you do not deliver a "cat" value, the entire web site (starting from the basepath) will be searched!

You can add additional code within the form area if you
  do not want to combine all search terms with AND,
  prefer the advanced input form by default,
  do not want to match case,
  want to find only whole words,
  do not want to search the titles,
  want to search also in the description- and keywords- meta tags,
  do not want to search the full text,
  want to search also in the alternative texts
  and text of Non-HTML files,
  if you wish a certain number of viewed found files per results page,
  want to change the sorting criterium from hits to date,
  want to preset the language incl. the international settings
  or want to use an alternative configuration set:
   <input name="and"       type="hidden" value="off">
   <input name="extra"     type="hidden" value="on">
   <input name="matchcase" type="hidden" value="off">
   <input name="noparts"   type="hidden" value="on">
   <input name="title"     type="hidden" value="off">
   <input name="meta"      type="hidden" value="on">
   <input name="text"      type="hidden" value="off">
   <input name="alt"       type="hidden" value="on">
   <input name="nonhtml"   type="hidden" value="on">
   <input name="hits"      type="hidden" value="20">
   <input name="sort"      type="hidden" value="date">
   <input name="lang"      type="hidden" value="de">
   <input name="conf"      type="hidden" value="1">
With using your own input form you can disable the pre-built input form.

A full documentation of the calling options can be found in chapter 6.13 of the Manual.

2.10 I want to link to the search engine with changed default settings

If you make a simple link to the search engine
<a href="/cgi-bin/hse/HomepageSearchEngine.exe">
   default search
</a>
you will get the input form in the simple look, with titles and the full text of the web pages as search source and maximal 10 hits per results page. But you can also change the presettings of the input form, by integrating the names of the corresponding delivery parameters known from the previous section 2.9 using the following syntax:
<a href="/cgi-bin/hse/HomepageSearchEngine.exe?extra;nonhtml">
   custom search
</a>
Note that the ; character acts as the separator between parameters - and *not* the & character as many other CGI applications use, breaking valid HTML. The value "on" does not need to be specified explicitly, but all others have to:
         /cgi-bin/hse/HomepageSearchEngine.exe?conf=1

3. Functionality

3.1 In which directories of my website takes the search place?

The start directory for the search has been set through the directive basepath and - if using categories - through the additional directives categories_dirNR for each category. For instance, if you have set
basepath = /web/www.xy.com/httpd/htdocs and categories_dir1 = english, so your start directory for category 1 is /web/www.xy.com/httpd/htdocs/english. In this directory *and* in all its sub directories the search will take place, unless you exclude sub directories from being inspected:

To exclude directories, specify their names in the list (2.1) exclude_dirs of the hse.ini file. Each directory will be opened to inspect sub directories in it, but only as long as the current directory name does not match with one specified in this list. You can use the wildcard symbol *, which stands for a number of zero or more of any characters. System directories created by Microsoft FrontPage will never be inspected.

To exclude all sub directories in all categories (which will improve the search speed), use the wildcard symbol mentioned above in exclude_dirs:
exclude_dirs = *
If you want to exclude sub directories only in certain categories, add the file path string -/*/* to the list of (7.3) categories_sourceNR:
eg. categories_source1 = -/*/* (for category 1).

3.2 Which files of my website will be searched?

First, all files in your start directory are inspected and all binary files (such as Microsoft Word .doc's or images) will be excluded. The remaining text files can be searched. Depending on which search sources the visitor has choosen, these can be web pages (HTML files) and/or Non-HTML text files.

All text files with the extension html, htm, shtml, phtml, php, php3, asp, aspx, jsp, cfm, mv, xml, wml  (all case-insensitive) will be recognized as Web Pages web pages, all other text files as Plain Text Files Non-HTML text files. From the latter, RichTextFormat Documents RichTextFormat files (extension rtf) will be treated in a special way so that only the real text content will be searched.

In the (2.2) ban_list you can specify strings (separated by blanks) to exclude certain files from the search. If a file's rear part of its URL matches a string from the ban_list, that file is banned. This "rear part of the URL" always begins with a slash (/) and is the part starting from the start directory. Note that a file will only be banned if its rear part of the URL matches one string *completely* - from the beginning to the end. The * wildcard character may be required at the beginning and/or at the end to match sub strings. If you have, for instance, the sub directory private with a file called secretfile.htm in the start directory, that file's "rear part of the URL" is /private/secretfile.htm.

Add the string /private/secretfile.htm to the ban_list to ban that one file. To ban all files under the /private/ directory, add /private/*. To ban all files called secretfile.htm, add */secretfile.htm. To exclude all files ending with .log, add *.log.

To reflect these examples, the .ini file must contain the line
ban_list = /private/* */secretfile.htm *.log
Directories and files beginning with _ or . are always in the ban_list.

Exceptions from the ban_list can be made with the directive (2.3) search_always. If a file's rear part of its URL matches a string from search_always, it will be searched always - regardless of the ban_list value. If you want to make all files searchable that have the name eg. public.htm (also if under the /private/ directory), set
search_always = */public.htm

View which files will be searched in the actual selected category (each without and with activated "Search text of Non-HTML files" checkbox) by typing in the search string list:files.

You can also exclude HTML files from being searched, without the need to set anything in the configuration file. To do so, use the "robots" meta tag within the files in question, as used to exclude them from being indexed by robots (including the HomepageSearchEngine spider):

<meta name="robots" content="noindex"></meta>

Details can be found in chapter 6.12 of the Manual.

If you want to make parts of webpages unsearchable, put them between a

<span class="HSE-nosearch"> </span> or
<div class="HSE-nosearch"> </div> area.

3.3 It it possible to search Word documents?

Yes and No. By default, Microsoft Word saves its documents in an own binary format known as .doc files, which can only be read by a Word interpreter. Thus, such Microsoft typical files cannot be searched by our search engine. But, if you save your Word documents as "Rich Text Format" with the extension .rtf, they can be searched by HomepageSearchEngine directly on your webserver when the "Search text of Non-HTML files" option is on. Only the real text content will be searched. This feature may be interesting mainly when the search engine is used in an Intranet.

3.4 How are the index files * organized?

The index files which can be created and modified by the Shell Executable, are tabstop separated text (.txt) files. If you don't use categories (that means *no* cat parameter is delivered to the search engine on calling it), the search takes place in the index file pair hse_index_html.txt and hse_index_nonhtml.txt. If you select a category no. NR (that means the cat=NR parameter is delivered to the search engine on calling it), the search takes place in hse_indexNR_html.txt and hse_indexNR_nonhtml.txt (where NR stands for a number from 1 to 99).

Each row of the index file consists of 9 columns (1-9) and contains all information for one file:

1 » 2 » 3 » 4 » 5 » 6 » 7 » 8 » 9 »

1...URL to the file
2...file size in KB
3...date of the last update in seconds since January 1, 1970
4...title of the file, if exists
5...content of the description meta tag, if exists
6...content of the keywords meta tag, if exists
7...content of all alt attributes of the img tags, if exists
8...extracted full text, if exists
9...URL to a custom icon image, if exists

» stands for the tabulator character.

*) The index feature is only available in the Pro edition.

3.5 Which file details will be printed in the result pages?

In the Plus and Pro edition, the result pages can be customized through section 6 of the hse.ini file. The details to be printed for each found file are specified in directive (6.3) results_details.

The directive (6.5) description specifies the kind of the description shown for each hit. Care will always be taken that no words will be cut in the description text. If you specify a given number C of characters, the first C characters from the first present description meta tag, if exists, will be shown:
<meta name="description" content="This is the description">
If there is no description meta tag in the file, the first C characters of the text from the beginning of the file's body will be displayed instead.

Additionally or alternatively, you may also choose a "Google-like" style by specifying a given number M of matches. This will display M lines each containing the text around one match. Note that the search speed decreases with a higher number of such specified matches. For best search speeds, some may specify not to show a description at all.

3.6 How will the results be sorted?

In the results page(s), all found files will be sorted in a ranking list. The method can be choosen by the visitor:

The default option, sorted by the number of matches brings up the file with the most matches as first hit and the one with the least matches as last hit. Should some files have the same number of matches, the file more up to date will be ranked higher. If such files also do not differ in the date of modification, they will be ordered alphabetically, by the name of the file path.

If somebody is interested in the hits most up to date, she may choose to sort by time of the last update. All update times are taken into account with an accuracy of one day (which means that the day time when a file was updated does *not* matter). If some files have been updated on the same day, the file with more matches will be ranked higher. If such files also do not differ in the number of matches, they will be ordered alphabetically, by the name of the file path.

Some sites include a systematically named collection of files where the only interesting thing is if a file matches or not. In such cases, an alphabetic sorting by the path name may be prefered.

3.7 How fast runs the search and where is the limit?

The on-the-fly method works satisfying fast (max. 1 to 5 seconds search time) on at-the-same-time searches of up to approx. 2 MB text consisting of about 200 files. Some webservers also have no problems with larger blocks (see reference sites); but most will probably have the limit for a practical usage in this range. In larger websites it is recommended to split it into several categories or/and applying the indexed search method *. This makes it possible to search the complete content of several Thousands of files at the same time containing many MegaBytes within a very short time.

*) The index feature is only available in the Pro edition.

3.8 How do I maximize the search speed in a large site?

To keep the search time satisfying small (< 5 seconds) also in large sites (let's say beginning at 10000 files or 100 MB), it may be required to make some optimizations. Proceed in the following order, until the performance gets good enough.

  1. Be sure that the index files *) do not contain unnecessary files
    (test it by searching for list:files)
  2. In section (6.5) of the hse.ini file, set
    description = 250 characters + 0 matches
  3. In section (6.6) of the hse.ini file, set
    highlight-style = none
  4. In section (6.3) of the hse.ini file, remove the "description" keyword from the results_details directive
  5. Set the following default delivery parameters:
    and=on;matchcase=on;noparts=on;nonhtml=off

*) The index feature is only available in the Pro edition.

4. Security

4.1 Is HomepageSearchEngine a secure program?

Yes, all of our web applications are programmed by us using a strict method, which minimizes the possibility to include security risks. HTML code put in by the user will never be put out to the webbrowser, to avoid unknown interpretations that may result in unwanted actions. Files or directories do *not* need write permissions for all (Owner, Group and Others) - and in no case any file's permission will be changed to such rights which would cause a security risk.

If the Admin Console is activated, it can only be used after authenticating by a username/password pair. If this does not occur via a secure (https) connection, a warning will be printed. Passwords will never be stored in clear text, but encrypted using the undecryptable DES (Data Encryption Standard) algorithm and stored outside of the document root. Cookies are not used.

4.2 What about privacy?

We guarantee that none of our products includes any Spyware, Trojans or Viruses. We strictly decline such practices. Stability, security and keeping your privacy are the most important principles for us. For instance, it is never required to accept Cookies.

4.3 How does the secure data transfer work?

The secure data transfer that is used on this site by the Online Order form to transfer the credit card number and by our application SecureTransfer to transfer any sensible data ensures that the data will always be transmitted encrypted on the entire way from your computer to a computer in our office. For this purpose, ANET runs an own, certified Secure Server Secure Server https://www.anet.at.
The data will be encrypted up to the Secure Server using the SSL (Secure Socket Layer) protocol with high-grade encryption.

There they will be encrypted, again with a strong encryption strength, with the cryptography standard PGP (Pretty Good Privacy) - using GPG (GNU Privacy Guard) - and finally mailed to us. So the data can only be decoded in our office, which is only possible by owning the Private Key and knowing the pass phrase.
on top  

HomepageSearchEngine.com © 1999-2008, ANET.at

100% valid XHTML 1.0