ANET.at HomepageSearchEngine =========== M a n u a l =========== Version 3.62 beta6+ (released on December 12, 2006) (c) 1999-2007 ANET.at Developed by Robert Allerstorfer This is the Manual for the HomepageSearchEngine (HSE) main package, which is the Core software bundle required for all available version types, including the free time-limited Trial version (expires on February 28, 2007) and the Free edition, as well as the Pro and Plus editions. Homepage: http://www.HomepageSearchEngine.com/ (English) or http://www.HomepageSearchEngine.de/ (German/Deutsch) This version includes support for the following 25 languages: Arabic, simplified Chinese, traditional Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Norwegian, Polish, Portuguese, Romanian, Russian, (Latin) Serbian, Spanish, Swedish, Thai and Turkish. ________ Contents 1. About this software 2. System requirements 3. Which packages do I need? 4. File structure of the main package 5. Quick install guide (for the advanced or impatient user) 6. Installation Manual 1. The installation directory ("/cgi-bin/hse") 2. Executable file ("HomepageSearchEngine.exe" on Windows or "HomepageSearchEngine.cgi" on Unix) and libraries 3. Main Configuration file ("hse.ini") and Queries file ("hse_queries.ini") 4. IMPORTANT - Admin Area, Creating an Admin account and the Users file 5. Style Sheet definition file ("/hse/HomepageSearchEngine.css") 6. Static HTML template file ("hse_template.html") 7. Dynamic HTML template file 8. Language files ("hse_lang.txt" and "hse_help.txt") 9. Language and Configuration sub directories / delivery parameters ("lang" and "conf") 10. Shell Executable: creating file-lists, indexes and using other tools (Pro edition only) 11. IMPORTANT - Testing your installation: search for "list:files" 12. Excluding certain HTML files or parts of them from being searched 13. Options to call the search engine 14. Optional turn from the Trial to the Free version with the Free key ("hse_key.cgi") 7. Special Shell Executable features and using the cronjob script (Pro edition only) 1. Spidering and URL Grabbing: Searching of any sites 8. Updating from a previous version 1. Updating from v3.42 2. Updating from v3.5 3. Updating from v3.5x 4. Updating from v3.6x 5. Clean installation and updating from versions earlier than 3.42 9. Debugging 10. Known issues 11. To-Do's 1. Internationalisation 2. New features 12. Support 13. Credits 14. History of version changes ("change log") 15. License agreement ______________________ 1. About this software This software is intented to search the real content of HTML pages (both static and dynamically generated ones), plain text, RTF- and PDF-files. The resulting output is in valid XHTML 1.0. The found files or any other URL's content can be viewed with all matches highlighted in a desired style. The main purpose is to search medium sized websites on the inter- or intranet, but it may also be used to search documentations or other content written in HTML on your local harddisk. ______________________ 2. System requirements Webspace on a Win32- or supported Unix-system with the right to run your own CGI programs. If the webspace is remotely hosted, access via FTP/SFTP is required for installation. The webspace can also be on the local harddisk or on a CD-ROM when a webserver software and a webbrowser is or will be installed. On large websites and for optimal use of the indexing functionality, shell access (usually via Telnet/SSH) is recommended. The minimal required memory resources lie at about 3 MB. Recommended available memory is at least 8 MB. On Apache webservers, allowed memory can be limited with the "RLimitMEM" directive. We recommend to set "RLimitMEM 8388608" (which means 8 MB) or a higher value. ____________________________ 3. Which packages do I need? Be sure to download the main package containing the latest available version supporting your target platform from http://www.HomepageSearchEngine.com/download_en.phtml Packages for both Windows and Unix are available, which only differ in the executable file and its associated libraries (within the "cgi-bin/hse" sub directory). If you want to use the search engine on Unix, it is strongly recommended to first run the "platform.cgi" script found in the "cgi-bin/platform" sub directory of the distributed package or at http://www.HomepageSearchEngine.com/_download/platform.tbz2 The file name of the distributed main package follows this scheme: "HSE-_." HSE.............stands for "HomepageSearchEngine" .......version number (eg. "3.62") ......platform the webserver is running on (target platform). The following 5 platforms are supported: Windows platforms: ------------------ "Win32".......Windows 32bit (Microsoft Windows 2003, XP, 2000, NT, Me, 98) on Intel x86 processors Unix and compatible platforms: ------------------------------ "FreeBSD".....FreeBSD v4.0 or higher on Intel x86 processors "Linux".......GNU/Linux (aka "Linux") 2.x Kernel on Intel x86 processors (i386/i586/i686) (all current glibc 2 based distributions like Debian, Fedora Core, Red Hat, SuSE, Sun RaQ3 or higher boxes) "MacOSX"......Apple MacOS X v10.x (with its Darwin Kernel) on Power Macintosh (ppc) processors "Solaris".....Sun Solaris v7 or higher (8, 9 or 10), which corresponds to SunOS v5.7 or higher on Sun sparc (sun4x series) processors ...........Filename extension: for Windows target platforms: ----------------------------- "zip".....ZIP-compressed file You can unpack it using a popular unpack utility like WinZip or 7-Zip. for Unix target platforms: -------------------------- "tbz2"....TapeArchive (tar) format BZip2 compressed file If you work on a Windows machine you can also unpack it using WinZip (as of version 11.0) or 7-Zip. You can also unpack it directly on the Unix machine's command line by typing tar -xvjf HSE-_.tbz2 (where "" and "" have to be replaced by the real strings). Make sure to unpack the package including sub directories, not cutting long file names and preserving the filename's case. The platform specific packages can be downloaded directly from their corresponding platform directories at http://www.HomepageSearchEngine.com/_download// For instance, all available files for GNU/Linux platforms can be found at http://www.HomepageSearchEngine.com/_download/Linux/ If your site contains characters in simplified Chinese, traditional Chinese or Japanese, you should also download the so called "Far East Pack" from its matching platform directory. The file name of that package follows the scheme "FarEastPack_.". For instance, the Far East Pack for GNU/Linux platforms is available at http://www.HomepageSearchEngine.com/_download/Linux/FarEastPack_Linux.tbz2 Alternatively, the latest main packages are always available at the following direct URLs: http://www.HomepageSearchEngine.com/_download/HSE_FreeBSD.tbz2 for FreeBSD (x86) http://www.HomepageSearchEngine.com/_download/HSE_Linux.tbz2 for GNU/Linux (x86) http://www.HomepageSearchEngine.com/_download/HSE_MacOSX.tbz2 for Apple MacOS X (ppc) http://www.HomepageSearchEngine.com/_download/HSE_Solaris.tbz2 for Sun Solaris (sparc) http://www.HomepageSearchEngine.com/_download/HSE_Win32.zip for Windows 32bit _____________________________________ 4. File structure of the main package There are 3 different main directories which contents go into different locations on your server machine that reflect a different nature: 1. the webserver's CGI executables (cgi-bin) directory 2. the webserver's document root directory 3. your home directory (outside a directory accessable by the webserver) 1. + cgi-bin CGI applications. To be put into the webserver's CGI executables (cgi-bin) directory. | | | + hse HSE's program (installation) directory - should correspond to the URL "/cgi-bin/hse" | | | + platform Platform Detector; an optional tool to find out which package you need on a Unix platform | 2. + htdocs HTML (web) documents. To be put into the webserver's document root directory. | | | + hse HSE's web documents directory - should correspond to the URL "/hse" | 3. + tools Tools. Can be put into the user's home directory (outside a directory accessable by the webserver) | + hse HSE's non-web directory HSE's program directory ("cgi-bin/hse") contains the platform specific Executable file and some associated libraries (shared objects) as well as a bundle of platform independent files. ___________________________________________________________ 5. Quick install guide (for the advanced or impatient user) Assuming you host your site on a Unix platform and you have shell access to this machine you can follow these quick instructions. If you don't understand this, read through the Installation Manual (chapter 6) instead. We assume that you have uploaded the matching package into your home directory, your web document root directory is "~/htdocs" and your script directory is "~/cgi-bin". The example below is given for version 3.62 for GNU/Linux platforms. If you have a package for another version number or/and platform, replace "3.62" or/and "Linux" by the proper strings. (1) On the shell, go into your home directory. Unpack and install the package by entering cd ~ # backup eventually already existing directories mv cgi-bin/hse cgi-bin/hse.old mv htdocs/hse htdocs/hse.old mv hse hse.old tar -xvjf HSE-3.62_Linux.tbz2 cd HSE-3.62_Linux mv cgi-bin/hse ../cgi-bin/ mv htdocs/hse ../htdocs/ mv tools/hse ../ cd ../cgi-bin/hse chmod 755 HomepageSearchEngine.cgi.bin mv HomepageSearchEngine.cgi.bin HomepageSearchEngine.cgi After installing, you may want to remove the "HSE-3.62_Linux.tbz2" file and the "HSE-3.62_Linux" directory: cd ../.. rm -rf HSE-3.62_Linux* (2) Open "hse.ini" and configure the 2 directives in its section 1.1 and 1.2. (3) Call the URL to HSE, http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.cgi and follow the online instructions (regarding the Admin account and HTML template). (4) "Fine tune" the hse.ini file (descriptions are included in the file itself). (5) Test your configuration by calling HSE's URL again and searching for "list:files". ______________________ 6. Installation Manual 1. The installation directory ("/cgi-bin/hse") ---------------------------------------------- The HomepageSearchEngine executable file (with its associated files and sub directories) should be installed into an own directory (refered to as "the installation directory") within a directory reserved to execute CGI programs (refered to as "the CGI directory"). The CGI directorie's name is usually 'cgi-bin' and we recommend to name the installation directory 'hse', so that the corresponding absolute URL of the installation directory will be "/cgi-bin/hse". The "cgi-bin/hse" directory of the distributed package has the same structure as the installation directory on your target web server machine must have: + hse Root program directory - corresponds to the URL "/cgi-bin/hse". Contains the Executable. | + lib Required library sub directory. Contains all additional shared object (.so) files needed by the Executable. | + conf Optional configuration sub directory (will be described in chapter 6.9) | + lang Optional language sub directory (will be described in chapter 6.9) In addition, HomepageSearchEngine usually needs a 'tmp' directory within its installation directory to store temporary files inside. That directory will be created automatically if it does not exist. Thus, the user HomepageSearchEngine is running as should have permissions to write (and delete) within the installation directory. If you intend to use HomepageSearchEngine on Windows under Microsoft IIS (Internet Information Services), please first consult our IIS support page at http://www.HomepageSearchEngine.com/iis_en.phtml 6.2 Executable file ("HomepageSearchEngine.exe" on Windows or "HomepageSearchEngine.cgi" on Unix) and libraries --------------------------------------------------------------------------------------------------------------- The (platform specific) executable file residing in the (root) program directory of the Windows package is called "HomepageSearchEngine.exe" and requires a main library called "p2xlib.dll" in the same directory. The Unix packages have the executable called "HomepageSearchEngine.cgi.bin" in their program directory, to be renamed to "HomepageSearchEngine.cgi" once residing on the server. First, upload the Executable into the installation directory on the server. Make sure to upload those platform specific files in binary mode and preserve the filename's case! Normally, you don't have to care about the mode since the file extensions should force the correct one. On Unix, rename "HomepageSearchEngine.cgi.bin" to "HomepageSearchEngine.cgi" afterwards and set the file permissions to 'rwx r-x r-x' ("chmod it to 755"). All other files should be readable by the executable without any change. On some servers you may need to chmod these files to 644 (rw- r-- r--). Secondly, upload the "lib" library sub directory with all the (platform specific) .so files in it. Again, make sure to upload in binary mode and preserve the case. Now, you should be able to point your webbrowser to the URL of the Executable, such as http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe (on Windows) or http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.cgi (on Unix) and you should see a welcome message and instructions what to do next. Of course, you have to replace "www.yourdomain.tld" by the real domain name (or IP address) of the webserver ("tld" stands for a top level domain such as "com"). 6.3 Main Configuration file ("hse.ini") and Queries file ("hse_queries.ini") ---------------------------------------------------------------------------- The Configuration file "hse.ini" residing in the root program directory is a plain text file containing all configurable directives, each following the syntax "directive = value" standing in one line. Whitespace at the beginning and end of a line will be ignored. A line can be continued at the following line if it ends with a backslash (\). Lines with ';' or '#' as the first (non-whitespace) character are treated as comments and will be ignored. Descriptions of all directives including possible values and examples are in the hse.ini file itself. Here is an overview of all the directives with the sections they are assigned to and their default values. The default value applies if the directive is set to a blank value or if the directive is even not specified. directive: default value: (1) Directory locations The following two values are the only ones that *must* be checked and probably edited: (1.1) basepath ../../ (or DOCUMENT_ROOT environment variable if exists) (1.2) baseurl / All following settings are optional. You may want to keep the default values for the first time. (1.4) cgiurl (2) Files ex-/including (2.1) exclude_dirs (2.2) ban_list (2.3) search_always (3) International settings (3.1) encoding iso-8859-1 (3.2) date_format M D, Y (3.3) decimal_sep . (3.4) dir ltr (3.5) utf8 off (4) Security tuning (4.1) debug_level 0 (4.2) max_found_files 500 (4.3) cgi_timeout 10 (4.4) allowed_referer_sites (4.5) protection_time 3 (5) Properties of the pre-built input form (search box) (5.1) searchbox_place bottom (5.2) searchbox_align auto (5.3) helpwindow_width 620 helpwindow_height 690 All following settings are for advanced users and do not affect the Free edition. (6) Results pages customizing (6.1) template_url (6.2) results_global search_string + options + time + summary + totalmatches (6.3) results_details icon + rank + head:title (100) + description \ + url + size + matches + update (6.4) imgsrc_webpage built-in imgsrc_rtfdoc built-in imgsrc_pdfdoc built-in imgsrc_textfile built-in (6.5) description 250 characters + 1 matches (40) (6.6) highlight-style background-color:yellow (6.7) target (6.8) notarget_list (6.9) results_href highlightmatches + gotofirstmatch + maxsize:150 (in the Free edition: "none") (6.10) imgsrc_previous built-in imgsrc_next built-in (6.11) query-links print:'Try this search in the entire Web with ' query:'Google' \ print:'
or in the Usenet with ' query:'Google Groups' or - if "lang=de" is delivered (as described in chapter 6.8 below) - print:'Probiere diese Suche im deutschsprachigen Web mit ' query:'Google.de' \ print:'
oder im deutschsprachigen Usenet mit ' \ query:'Google.de Groups' (in the Free edition: "none") (7) Categories (7.1) categories_nr 1 (in the Free edition: "none") (7.2) [ categories_nameNR ] categories_name1 whole website (7.3) [ categories_dirNR ] (7.4) [ categories_sourceNR ] Open the configuration file with a text editor (such as Notepad on Windows) and set the prefered values. First, only configure the directives of section 1 (Base directory) and save the edited file. It can be saved in each DOS, Unix or Mac format. Then upload that file together with the file called "hse_queries.ini" (Queries file) into your installation root directory. The latter can be used to optionally customize the global search engines to be queried, as set in the directives "query-links" (6.10) and "categories_sourceNR" (7.4) of the main configuration file. If you point your webbrowser to the URL of the Executable again, you should see the first graphical screen, advising you what to do next. Later, you may want to "fine tune" the hse.ini file, replacing the original one. Finally, test your configuration by searching for "list:files" without *and with* the "Search text of Non-HTML files" checkbox switched on. If you are using categories (defined in section 7), repeat that search within all categories. This will also show you that you may want to exclude some directories and files by modifying the "exclude_dirs" and "ban_list" values. 6.4 IMPORTANT - Admin Area, Creating an Admin account and the Users file ------------------------------------------------------------------------ Now a link to the Admin Area appears. This is http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?admin (or equivalent). You should now go to this link because at the first time this URL will be accessed the user will be asked to create a username/password pair for an Admin account. Once the first account has been created, you may login with that user data in the future to be able accessing administration tools such as a Console to run HomepageSearchEngine Shell Executable. You may also create additional Admin accounts. So make sure not to forget your Admin login data! The created username/password pairs will be stored in a text file called "hse_users.cgi". Although this users file's extension is ".cgi", it is not executable. The purpose of its "false" extension is to prevent it from being read for higher security. The passwords are encrypted using the undecryptable DES algorithm and work on both Windows and Unix platforms. The users file has the same format as the authUserFile that the .htaccess method uses for protecting directories. Therefor, you can also use the Admin Area to create such authUserFiles. If you want to disable accessing the Admin Area for security or any other reasons, just copy an *empty* file called "hse_users.cgi" into the installation directory. 6.5 Style Sheet definition file ("/hse/HomepageSearchEngine.css") ----------------------------------------------------------------- Upload the .css file found in the "htdocs/hse" directory into HSE's web documents directory (/hse). This Style Sheet definition file is required by the HomepageSearchEngine Executable to properly display its generated pages. It will be loaded via the HTML template file described in chapter 6.6 and 6.7 below. 6.6 Static HTML template file ("hse_template.html") --------------------------------------------------- You should then upload "hse_template.html" found in "cgi-bin/hse" into the installation directory. After refreshing your webbrowser at the search engine's URL, the upper and lower part of the page has changed. Edit this file to fit your desired design and upload it again. In its head there is a reference to the Style Sheet definition file mentioned above. Be sure that its URL ("/hse/HomepageSearchEngine.css" by default) points to the proper location where you have uploaded the file to. You will then see the styles that affect all elements which HSE creates on the results pages. You may want to edit the style sheet. If you want to make links relatively, be sure to make them all relatively to the executable file! But, best would be to make all links absolutely, beginning with "/" (the web root directory). Note that the design always keeps the same, since this template produces static HTML. This is the easiest way and may be sufficient for most sites. If you are a more pretentious webmaster, you may want to use a dynamic HTML template instead (see next chapter). The border between the upper and lower part is marked by a line consisting of Never remove that line! 6.7 Dynamic HTML template file ------------------------------ Advanced users may prefer using a dynamic instead of a static HTML template file. This allows communication between HSE and a server sided script language like PHP, leading in more flexibilty. It is possible to integrate HSE into a cookie based user access system written in PHP. For this purpose, put a template file into HSE's web documents directory (per default '/hse'). You can name it how you want, but must take care of the correct file extension that is required by your server (most likely '.shtml' for SSI and '.phtml' or '.php' for PHP). Make sure that the border line (as in the static HTML template) is present. There is a sample SSI enabled and PHP enabled dynamic HTML template file called "hse_template.shtml" and "hse_template.phtml", respectively, in the "htdocs/hse" directory of this package. Again, remember that all links should be absolutely, beginning with "/" (the web root directory)! Once you have edited and uploaded your custom dynamic HTML template, you must specify its absolute URL in section (6.1) - template_url - of your hse.ini file, eg. template_url = http://www.yourdomain.tld/hse/hse_template.php To enable highest compatibility between different servers, you may drop the "http://" prefix and use template_url = /hse/hse_template.php instead. Then, the full URL will be constructed using your server's environment variable HTTP_HOST by prefixing "http://HTTP_HOST". For example, if the full URL to HSE is "http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.cgi", HTTP_HOST equals to "www.yourdomain.tld" and the example above would resolve in the URL "http://www.yourdomain.tld/hse/hse_template.php". If you call HSE via "http://www.yourdomain.tld:81/cgi-bin/hse/HomepageSearchEngine.cgi", HTTP_HOST equals to "www.yourdomain.tld:81" and the example above would resolve in the URL "http://www.yourdomain.tld:81/hse/hse_template.php". If you call HSE via SSL (using the "https" protocol), the example above would resolve in the URL "https://www.yourdomain.tld/hse/hse_template.php". If the dynamic template files resides in a area that requires authorization, you must specify login information (username and, if required, the corresponding password) HSE should use to authenticate, using the syntax template_url = https://username:password@www.yourdomain.tld/hse/hse_template.php If you are using HSE within a website that uses PHP code to restrict access based on different user levels, you may want to deliver some user properties from the HSE CGI application to the PHP environment. Possible user properties are: REMOTE_ADDR....the user's IP address (= the value of the REMOTE_ADDR environment variable) COOKIE_VALUE...the value of the cookie with the specified name COOKIE_NAME When you specify template_url like template_url = /hse/hse_template.php?ip;cookie=COOKIE_NAME (where COOKIE_NAME must be replaced by the real name of the cookie you want to get), the above mentioned values are available to PHP within the template's QUERY_STRING, like this: ip=REMOTE_ADDR;cookie=COOKIE_VALUE; Based on the user information sent from HSE to the PHP template, you can instruct the PHP template to send information back to HSE, via a special meta-tag that will either allow or disallow access. When the template contains the meta-tag access will be denied. This is especially useful when different configuration sets (see below) for each user level are used. Look inside the provided sample templates for more information on this. 6.8 Language files ("hse_lang.txt" and "hse_help.txt") ------------------------------------------------------ If you want the program's output in another default language than English, then also upload "hse_lang.txt" and "hse_help.txt" found in the matching language directory of the "lang" sub directory into the installation directory. The name of each language directory LANG is the 2 letter ISO 639-1 language code (eventually with an additional "-" character followed by a 2 letter-regional code) of the language it holds. These and their associated international settings for the currently 25 supported languages are: language code | language | encoding | date_format | decimal_sep | dir ------------------------------------------------------------------------------------ ar | Arabic | windows-1256 | DD/M/Y | . | rtl cs | Czech | iso-8859-2 | D. M. Y | , | ltr da | Danish | iso-8859-1 | D. M Y | , | ltr de | German | iso-8859-1 | D. M Y | , | ltr el | Greek | iso-8859-7 | D M Y | , | ltr en | English | iso-8859-1 | M D, Y | . | ltr es | Spanish | iso-8859-1 | M D, Y | , | ltr fi | Finnish | iso-8859-1 | D. M Y | , | ltr fr | French | iso-8859-1 | D M Y | , | ltr hr | Croatian | iso-8859-2 | D. M Y | , | ltr hu | Hungarian | iso-8859-2 | Y. M D. | , | ltr it | Italian | iso-8859-1 | M D, Y | , | ltr ja | Japanese | shift_jis | Y.M.DD | . | ltr nl | Dutch | iso-8859-1 | D M Y | , | ltr no | Norwegian | iso-8859-1 | D. M Y | , | ltr pl | Polish | iso-8859-2 | D. M Y | , | ltr pt | Portuguese | iso-8859-1 | M D, Y | , | ltr ro | Romanian | iso-8859-2 | D M, Y | . | ltr ru | Russian | windows-1251 | D M Y | , | ltr sr | (Latin) Serbian | iso-8859-2 | D. M Y | , | ltr sv | Swedish | iso-8859-1 | D M Y | , | ltr th | Thai | tis-620 | DD/M/Y | . | ltr tr | Turkish | iso-8859-9 | D. M Y | , | ltr zh-cn | simplified Chinese | gb2312 | Y.M.DD | . | ltr zh-tw | traditional Chinese | big5 | Y.M.D | . | ltr If there are no language files for your preferred language or if you want to change the current words to fit your needs, you can edit the distributed language files. Please contact us before you want to create a new language file set if you want to get a full version of our search engine for free. 6.9 Language and Configuration sub directories / delivery parameters ("lang" and "conf") ---------------------------------------------------------------------------------------- The package's "cgi-bin/hse" directory contains a "lang" sub directory which holds all available language directories containing the language files; and a "conf" sub directory which is the container for configuration directories named "1", "2", .. to "9" that can be filled with additional configuration sets. Upload the entire "lang" sub directory into the installation directory to get the option to switch between languages. On Unix, make sure all directories are chmod'ed 755. You can then change the language and its associated international settings - as stated in the table above - by delivering the "lang" parameter with the name of the language directory (the language code) as its value. The separating character for thousands blocks will always be " " (eg. "1 679 matches") unless you deliver "lang=en", resulting in changing that character to "," (eg. "1,679 matches"). The value of the "lang" delivery parameter will also be sent to the server as accepted language ("Accept-Language" HTTP header) which results in the environment variable "HTTP_ACCEPT_LANGUAGE" set to this language code. This may be useful for a script that automatically detects the user's preferred language. You can try using the included dynamic HTML template file called "hse_template.shtml" (see chapter 6.7 above) to see how this works. For instance, calling http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?lang=de changes the language and all its associated international settings to German and sets the "HTTP_ACCEPT_LANGUAGE" environment variable to the "de" value. Similary, if you have uploaded the "conf" sub directory into the installation directory you can deliver a "conf" parameter with the name of an existing configuration directory (a number from 1 to 9) as its value. For instance, calling http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?conf=1 forces the search engine not to use any of the default configuration files (residing in the main installation directory), but instead using those found in the directory "1". So you can use one installation of HomepageSearchEngine with up to 10 different configuration sets. Please see the WhatsThis.txt file residing in the conf/1 directory for details. NOTE: Each uploaded configuration directory must at least contain the hse.ini file. The distributed package contains only one configuration directory, namely "1". If you want to upload this directory, make sure to fill it with an hse.ini file! If you create or upload additional directories ("2" to "9") make sure that all of those contain an hse.ini file. You can disable the access to a configuration set by setting allowed_referer_sites = none in the hse.ini file residing in the corresponding configuration directory. That .ini file does not need to contain anything else. This may be especially useful if you only use configuration sub directories, but don't want the main configuration set to be used. Therefore, place an hse.ini file as mentioned above into your root installation directory. Or, use the file "hse.ini.disabled" found in the package's "cgi-bin/hse" directory (after renaming it to "hse.ini"). If the URL http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe will be called, nothing appears but a message like "ERROR: Sorry, this application is set not to be callable from the site 'www.yourdomain.tld'." Since one configuration set shares the same "hse.ini" configuration file, it defines one certain default character encoding (which is either an Unicode Encoding such as "utf-8" or a byte based Encoding - previously called "character set" - such as "iso-8859-1"), specified by the "encoding" directive. So be sure that all categories used within a configuration set represent files encoded in that Encoding. Similary, only deliver a "lang" parameter to a configuration set when its associated Encoding matches the one specified in the hse.ini file. For example, if the hse.ini file has defined the default "encoding = iso-8859-1" you may call http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?conf=1;cat=1;lang=en and http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?conf=1;cat=2;lang=de but not http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?conf=1;cat=3;lang=cs since the latter would assume the iso-8859-2 encoding. Instead, use another configuration set (for example "2") with an hse.ini file specifying "encoding = iso-8859-2" and associated to files only encoded in that Encoding. Then, you may call http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?conf=2;cat=1;lang=cs 6.10 Shell Executable: creating file-lists, indexes and using other tools (Pro edition only) ------------------------------------------------------------------------------------------- Especially on large websites, you may want to speed up the search time by searching in an index instead of searching the files directly. The content of all matching HTML files will be stored in a tabstop separated text file called "hse_indexNR_html.txt". The file "hse_indexNR_nonhtml.txt" holds the content of all matching Non-HTML files. Both files represent the index file pair for category NR. If the index file *pair* for the actual category is present, it will be used, otherwise the flat or the on-the-fly search method will be applied. To create the index files, go into the installation directory on the command prompt (shell) and execute the executable file. To do this, shell access (via Telnet, SSH or direct access) is required. On Windows, you have to type something like cd "D:\InetPub\wwwroot\cgi-bin\hse" HomepageSearchEngine (with or without its ".exe" extension) while on Unix, you have to type something like cd /home/myusername/cgi-bin/hse ./HomepageSearchEngine.cgi If you do not have shell access, you can use the web based Console which is part of the Admin Area to execute the executable file on the shell (the executable file then behaves as the "Shell Executable"). Just point your webbrowser to http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?admin (or equivalent). Remember that you need to login with a username/password pair created in step 6.4 above. Executing the Shell Executable with the '-help' (or '--help') argument will show how it can be used: USAGE: HomepageSearchEngine COMMAND [-OPTION[=VALUE]] [-OPTION[=VALUE]] ... COMMAND and its associated [optional] OPTIONs can be one of: pdfconvert [-strictcheck] [-listonly] [-nocache] -dir=DIR [-quiet] [-debug[=options|all]] [-batchmode] [-help] spider [-conf=DIR] -cat=NR [-lang=LANG] [-max=NR_URLS] -url=URL [-querystrings[=all|"NAME[=VAL];[NAME[=VAL];]"]] [-pdf2txt] [-prerobotsfile[=FILE]] [-nocleanup] [-nobackup] [-debug[=options|robotrules|all]] [-batchmode] [-help] geturls [-conf=DIR] -cat=NR [-lang=LANG] [-nocache] [-nobackup] [-debug[=options|all]] [-batchmode] [-help] makelist [-conf=DIR] [-cat=NR] [-nononhtml | -nohtml] [-sort=METHOD] [-nobackup] [-debug[=options|all]] [-batchmode] [-help] index [-conf=DIR] [-cat=NR] [-nononhtml | -nohtml] [-part=PART[/TOTALPARTS]] [-nocheck] [-nobackup] [-debug[=options|all]] [-batchmode] [-help] changeurls [-conf=DIR] [-cat=NR] [-prefix=URL] [-nobackup] [-debug[=options|all]] [-batchmode] [-help] DESCRIPTION of the COMMANDs: pdfconvert Converts PDF to text files recursively, starting at a given directory. spider Spiders a remote site recursively, starting at a given URL. geturls Get remote URLs and stores their content in files on your site. makelist Makes the file-list(s) required for indexing all or specific categories. index Indexes all or specific categories. changeurls Changes URLs in specific index file pairs. OPTIONs available for all commands: -debug Prints additional information useful for debugging. When no value or the 'all' value [=options|all] is assigned ('-debug=all'), all available debug information will be printed. To be verbose only about the received options, specify '-debug=options'. Some commands may have additional possible debug values, described in their own help screens. -batchmode Turns on batch mode (Does not ask any questions). Required to run out from a script. -help Displays help for the given command. Without a command, displays general help (this screen). Additional Options available for all commands except "pdfconvert": -conf=DIR Specifies DIR (1..9) to be used as configuration directory ./conf/DIR If not set, the main directory (where the Executable lives) will be used. -cat=NR Specifies the category number NR (1..99) to be used. With the 'spider' command, it tells which URL-list file should be created and which directory should be prepared for use by the 'geturls' command. With the 'geturls' command, it tells which URL-list file should be read and which directory should be used to store the remote URLs' content. With the 'makelist' command, it tells which file-list file pairs should be created. With the 'index' command, it tells which index file pairs should be created. If not set, all file pairs will be created. With the 'changeurls' command, it tells which index file pair should be modified. If not set, the main index file pair (hse_index_html.txt and hse_index_nonhtml.txt) will be modified. -nobackup Does not backup a file before overwriting it. Useful when available disk space is limited to a small size. Additional Options available for the command "pdfconvert": -dir=DIR (Mandatory) The start directory to be looked for *.pdf (and *.pdf.txt) files. -strictcheck Determines supported PDF files using a stricter check. Only files in PDF version up to 1.5 are accepted (otherwise, the maximum allowed version is 1.6). Also does not accept files that have space in their path. -listonly Switches to listonly mode. Only lists found PDF files, without actually writing or removing anything. A list of supported PDF files will be printed to STDOUT. To get a clean list, containing only file pathes, also apply the '-quiet' option. As in normal operation mode, another list, pointing out unsupported PDF files will be printed to STDERR. -quiet Does not print verbose output. Useful in conjunction with the '-listonly' option. Additional Options available for the command "spider": -url=URL (Mandatory) The URL the spider should begin collecting (internal) links from. -max=NR_URLS The maximum number of URLs NR_URLS (-1..1000000) to be got and checked. Defaults to 500. -querystrings If a link contains a query string (the string after a '?' character, if present), keep it. [= When no value or the 'all' keyword is assigned ('-querystrings=all'), all variables within all| the query string will be kept in the URL. If you only want to keep certain GET variables, "NAME[=VAL]; specify a string enclosed between double-quotes, containing each variable name NAME [NAME[=VAL];]" (with an optional value VAL) as 'NAME;' (or 'NAME=VAL;'). For example, if you set ] '-querystrings="lang;conf;"' and the spider finds a link to 'file.cgi?conf=1&id=xy&lang=en', that link will be verified as 'file.cgi?lang=en&conf=1'. You can also use the '*' character as wildcard for zero or more characters (eg. '-querystrings="file=*.pdf;"'). -pdf2txt Adds *.pdf URLs to the URL-list if corresponding *.pdf.txt URLs exist. -prerobotsfile Prepends the content of an own robot rules file the site's /robots.txt file. FILE is a file name, [=FILE] relative to the Executable's "robotrules" sub directory and defaults to "pre-robots.txt". -nocleanup Does not clean up the directory containing the previously grabbed files. Additional Options available for the commands "spider" and "geturls": -lang=LANG Sends the LANG value (ISO 639 language code; eg. 'en') to the server as accepted language. Additional Option available for the commands "pdfconvert" and "geturls": -nocache Does not use the cache originating from already processed files with the same Last-Modified date. Additional Option available for the command "makelist": -sort=METHOD Specifies the method to sort the file-list. METHOD can be 'date' (default) to sort the files by date of last modification (latest files first), 'name' to sort alphabetically (by their path names) or 'none' for no sorting (to have the same order as an on-the-fly search uses). Additional Options available for the commands "makelist" and "index": -nononhtml Does not create the file-list or index for Non-HTML files. -nohtml Does not create the file-list or index for HTML files. Additional Option available for the command "index": -part=PART[/TOTALPARTS] Enables incremental indexing, specifying the part number PART of TOTALPARTS. TOTALPARTS can be a number of 2..50 total parts. If not specified, it defaults to 4. The 'index' command must be executed TOTALPARTS times, beginning with PART number 1 (-part=1), incremented by 1 each time (-part=2, -part=3 and -part=4). -nocheck Does not check the index file for correct content after creating it, to save memory. Additional Option available for the command "changeurls": -prefix=URL Use the specified URL as the start URL to be prefixed to each 'file' field As you can see, the the Shell Executable can be called together with a command and a number of options. A command is always a single word, while an option always begins with the minus ("-") sign (two minus signs also work). To index a site, a file-list must first be made (using the 'makelist' command) which is then used to create the index files (using the 'index' command). Detailed information can be obtained by executing HomepageSearchEngine makelist -help and HomepageSearchEngine index -help The most powerful way to index your site would be if you let the index files to be created automatically every day. This could be done on Unix using the shell script "hse_cronjob.sh" or on Windows using the the batch script "hse_cronjob.bat" found in the package's "tools/hse" directory. Details are available in the "hse_cronjob_ReadMe.txt" file residing there. Instead of creating the index files directly on the production server, you can also create them on your local hard drive where you have mirrored the site, regardless of the platform. Just be sure to use the correct executable on your development platform. No webserver is required to be installed. Finally upload the index files via FTP onto the production server. 6.11 IMPORTANT - Testing your installation: search for "list:files" ------------------------------------------------------------------- The best thing to do when installation is finished is to call the search engine in the "advanced search" form and then search for the term "list:files". You will then see which search method will be applied and which files will be searched. If the resulting list has been collected on-the-fly, you will also know how many files and directories had to be inspected, as well as the required CPU time. This may unveil unnecesarry items and the need to add some directory names to the "exclude_dirs" directive of the hse.ini file. REPEAT this step after all the 4 checkboxes regarding the parts of the web pages are *disabled* and the (last) checkbox "Search text of Non-HTML files" is enabled. You will then see which Non-HTML files will also be searched. You may find that unwanted files are included that have made the index file very large. Reconfigure the .ini file in this case and re-index your site again. If you have set up (more than one) categories, check all categories. Note that the "list:files" output does not work when the "debug_level" directive in your .ini file is set to a value higher than 2. So be sure that this directive keeps at its default value of 0 or is set to 1 or 2 when you want to be able to view this list. 6.12 Excluding certain HTML files or parts of them from being searched ---------------------------------------------------------------------- There may be reasons to exclude certain areas within several HTML files from being searched. Put such areas between a "span" or "div" tag assigned to a "HSE-nosearch" class to force HomepageSearchEngine not to look inside those areas: This text will never be looked up by HomepageSearchEngine
The "div" tag instead of the "span" tag can also be used, which has the same effect to HSE but may produce a different HTML output
You can also exclude entire HTML files from being searched, without the need to specify them in the configuration file. To do so, use the "robots" meta tag within the files in question, as used to exclude them from being indexed by robots: where DIRECTIVES can contain one or more (comma-separated) of the directives "none", "noindex", "nofollow" and "search". The following table shows how these directives will instruct the HomepageSearchEngine spider (as well as other robots) to either index or not index the page and either follow or not follow links: Action: DIRECTIVES: index + follow => (or: "index, follow") index + nofollow => "nofollow" (or: "index, nofollow") noindex + follow => "noindex" (or: "noindex, follow") noindex + nofollow => "none" (or: "noindex, nofollow") Thus, if a HTML file contains the tag that file will be completely ignored by the HomepageSearchEngine spider. It will also never be searched or indexed by HomepageSearchEngine. If you want to allow HSE to search it, but keep it away from robots, add "search" to the DIRECTIVES: When you index your site (by applying the "index" command) using the "-debug" option, you will see all the files being skipped because of their robots meta tag's content. 6.13 Options to call the search engine -------------------------------------- Since HomepageSearchEngine creates a pre-built input form (search box) automatically, there is no need to call the search engine from another form. However, webmasters that want to fully use their own design may want to disable the pre-built search box by setting searchbox_place = none in section (5.1) of the hse.ini file. Creating an own input form instead to call HomepageSearchEngine should include at least something like the following HTML code:
This will create a small text input box (with a width of 15 characters) that calls the search engine (at the location '/cgi-bin/hse/HomepageSearchEngine.exe') with the search terms entered into that box once the ENTER key has been hit. The name of that input box text field must be "terms" so that the parameter "terms" will be delivered to the search engine. The "submit" delivery parameter tells the search engine to print an error message when no search term has been submitted. If you have enabled the pre-built input form, you can force it to be shown in the Advanced form by adding this additional code within the form area: All the options you can select in that advanced input form can be pre-selected by your own form. Most people add those parameters as hidden form fields, in the same way as shown above. These delivery parameters have the following names and are pre-set to the following default values when their names are not included in the calling form: name: default value: meaning: and on combine all search terms with logical AND extra off do not show the input form in advanced mode (with extra options), but in simple mode matchcase on restrict the search to force matching case noparts off do not restrict the search to find only whole words hits 10 show maximal 10 hits per results page sort hits sort results by the number of matches (hits) If you want to change a default "off" value to "on", you must explicitly add these form fields with their new values. The same applies if you want to change the "hits" value from "10" to a number from 2 to 200. In the same way, the "sort" value can be changed to "date" or "name". The delivery parameters corresponding to the five possible search sources have the names shown in the following table. If *none* of them are specified explicitly (none of their names are included in the calling form), they are pre-set to the following values: title on search in the title parts of the web pages meta off do not search in the description- and keywords- meta tags parts of the web pages text on search in the full text of the web pages alt off do not search in the alternative texts of the images parts of the web pages nonhtml off do not search text of Non-HTML files If you want to change this pre-set combination, you must include all parameters that should be set to "on". For instance, if you want to also include Non-HTML text files additionally to titles and full text of web pages, you must include all these three parameters "title", "text" and "nonhtml" with their value set to "on". In addition to the parameters corresponding to the advanced input form, you may set a "cat" parameter with a value from 1 to 99 to specify the category to be searched in. The configuration set to be used can be specified by delivering a "conf" parameter with a value from 1 to 9; and the language settings can be specified by delivering a "lang" parameter with the language code as its value (see chapter 6.8 for details). If you provide a special parameter called "append", its value will be appended to the links on the results page. An example where this feature may be useful are shopping carts that use a dynamically generated ID to identify the shopper. That ID (the "append" value) will re-appear in the URL of the resulting links in the way 'URL?append=APPEND' where 'URL' is the URL to the found file and 'APPEND' is the value assigned to the "append" parameter. If you set something like results_href = /cgi-bin/hse/passurl.cgi?url=URL in section (6.8) of your hse.ini file, your own CGI application ("passurl.cgi" located at "/cgi-bin/hse") could then split the ID out of the delivered URL and redirect the result link to your shopping cart application, including the ID that has originally been generated by this application. Thus, the shopper's cart will not be dropped. Such a helper application "passurl.cgi" is included in the "/cgi-bin/hse" directory of the distributed package. It is a plain text Perl script. Instructions on how to use it can be found in the file itself. 6.14 Optional turn from the Trial to the Free version with the Free key ("hse_key.cgi") --------------------------------------------------------------------------------------- You can switch the behaviour of the search engine from the Pro Trial to the Free version ("Free edition") by downloading http://www.homepagesearchengine.com/_download/key_HSE-Free.zip and copying the key file "hse_key.cgi.bin" found in that additional package into the directory where the executable file resides. Finally, you must rename the key file to its final name "hse_key.cgi". The Free edition is not time limited, however, it is feature limited. For details, please refer to http://free.HomepageSearchEngine.com/ If you want to use HomepageSearchEngine without both time and feature limitations, you have to purchase a license key at http://www.HomepageSearchEngine.com/order_en.phtml Thank you! ____________________________________________________________________________________ 7. Special Shell Executable features and using the cronjob script (Pro edition only) As you have learned in the chapter how to index your site (6.10), the commands available in the Shell Executable are not limited to those used to index your site. Here you will find a description of other useful Shell Executable features. 7.1 Spidering and URL Grabbing: Searching of any sites ------------------------------------------------------ This feature allows you to search the content of any sites, regardless where they are hosted, out from indexes located on your server. You may also want to include your own site if it contains dynamic content. Of course, all results point to the original URLs. The last modified date of each locally stored file is the same the URL on the remote site has. The index creation including all the involved commands is usually be processed using a script, typically on a regular basis, as a cronjob. In the package's "tools/hse" directory there is such a cronjob script, called "hse_cronjob.sh" (for Unix) and "hse_cronjob.bat" (the equivalent for Windows). It includes a detailed description on what is going on, based on an example as summarized below (taken from the Windows version). The directory where to store all the "grabbed" files is specified in the "hse.ini" file. Assuming we want the site "www.site1.tld" searchable as category 1 of 3 total categories, make sure your "hse.ini" file contains something like this: basepath = D:\InetPub\wwwroot categories_nr = 3 categories_name1 = www.site1.tld in English categories_dir1 = hse/_sites/en/www.site1.tld categories_source1 = The five steps required to create the index of a spidered site are: (1) Spider one or more entire sites to create the URL-list required to grab URLs, using the command HomepageSearchEngine spider This automatically generates an URL-list file called "hse_urllistNR.csv", holding all URLs from a given website (or sub part of it). The spider starts at a given URL, down to a default or specified limit, similar than known from the "GNU Wget" utility. But, unlike Wget, HSE's spider only stores URLs after having verified them to hold content of a 'text/html' MIME type, including dynamic content created by .cgi files. Detailed information can be obtained by executing HomepageSearchEngine spider -help To generate the URL-list file associated to category 1, "hse_urllist1.csv", representing the English "www.site1.tld" site, we execute: HomepageSearchEngine spider -cat=1 -lang=en -url=http://www.site1.tld/ -nobackup That command would do the same, but with no limit: HomepageSearchEngine spider -cat=1 -lang=en -max=-1 -url=http://www.site1.tld/ -nobackup Some URLs respond with different content, depending on the "Accept-Language" value you send. If you visit the URL "http://www.HomepageSearchEngine.com/" having "en" set in your browser as your primary accepted language, you will get the English start page "index.phtml". If you have set "de" you will get the German page "index_de.phtml" instead. When using HomepageSearchEngine Executable with the "spider" command, it acts as a browser (HTTP client) and you can set your prefered accepted language using the "-lang" option. Thus, providing the option "-lang=de" instead of "-lang=en" would spider the German part of the specified site instead of the English one. (2) Grab the content of remote URLs to your site, using the command HomepageSearchEngine geturls This grabs the content of all the URLs contained in the URL-list and stores them on your site. Detailed information can be obtained by executing HomepageSearchEngine geturls -help If we execute HomepageSearchEngine geturls -cat=1 -lang=en -nobackup all the fetched files listed in the "hse_urllist1.csv" file will be stored in the directory "D:\InetPub\wwwroot\hse\_sites\en\www.site1.tld", determined from the "hse.ini" file as mentioned above. Although the most convenient way to create the URL-list file is to do it automatically using the spider command, you can also create it manually. The .csv file must consist of lines in the format URL|file such as http://www.homepagesearchengine.com/index.phtml|http_}}www.homepagesearchengine.com}index.phtml.html /somescript.php?someparameter=somevalue|somefilename.html A sample "hse_urllist1.csv" file is included in the package's "cgi-bin/hse" directory. Since the URL grabbing results in a set of static HTML files, let the files all be stored with the ".html" extension, regardless of the extension the originating URL's file has. Therefore, each line in the URL-list file should end with ".html". If a line begins with "/" (instead of a fully qualified URL such as http://www.somedomain.tld/) the "/" will then be replaced by the baseurl value of the hse.ini file. Note that then the baseurl value must be fully qualified (beginning with 'http://' or 'https://'), otherwise the URL cannot be grabbed. (3) Make the file-list required to create the index, and (4) Create the index, using the commands HomepageSearchEngine makelist and HomepageSearchEngine index The usage of these two commands has already been discussed in chapter 6.9 above. Since we have only HTML files, we execute them using the '-nononhtml' option: HomepageSearchEngine makelist -cat=1 -nononhtml -nobackup HomepageSearchEngine index -cat=1 -nononhtml -nobackup -nocheck (5) Rename the URLs in the indexes back to their original ones, using the command HomepageSearchEngine changeurls This re-changes the URLs in the created index to its original locations, so that the results will link to the proper locations. Detailed information can be obtained by executing HomepageSearchEngine changeurls -help The same URL-list file as used in step 1 will be used again. The full command we execute is HomepageSearchEngine changeurls -cat=1 -nobackup Note: If you don't want to use the indexed search method, but the on-the-fly method instead, you can still use that URL-list file to redirect your visitors to those changed locations. In that case, you must be sure to have set the "changeurls" keyword in section (6.8) of your hse.ini file, eg. results_href = changeurls + highlightmatches + gotofirstmatch Once you have edited your cronjob script and tested it successfully, you may want to run it automatically on a regular basis, eg. every day at 4 o'clock in the morning. Consult the "hse_cronjob_ReadMe.txt" found in the package's "tools/hse" directory on how to do this. ___________________________________ 8. Updating from a previous version To explore all new features, a "clean" installation is recommended - as described in chapter 8.5 below. The impatient may want to continue using a part of the current installation by following the update instructions below instead. Be sure to backup your current installation files before upgrading and start with the instructions for your matching version. 8.1 Updating from v3.42 ----------------------- (1) Replace the Language sub directory ("lang"). (2) If you are using a custom input form that preserves the previous form settings, replace "hse_customform.js". Continue with the instructions in the next step. 8.2 Updating from v3.5 ---------------------- (3) Upload the robot rules sub directory ("robotrules") if you are using the spider. Continue with the instructions in the next step. 8.3 Updating from v3.5x ----------------------- (4) Replace the Style Sheet file "HomepageSearchEngine.css" and eventually edit it to fit your desired styles. Ensure that the HTML template is referencing the .css file properly. (5) Replace the configuration file(s) "hse.ini" and edit it to fit your desired settings. (6) Upload (or replace, respectively) all library files. Continue with the instructions in the next step. 8.4 Updating from v3.6x ----------------------- (7) Replace the executable file. 8.5 Clean installation and updating from versions earlier than 3.42 ------------------------------------------------------------------- (1) Make a new "clean" installation into a new directory, eg. called "hse-new". (2) Once the new installation works fine, remove the old installation (eg. the directory "hse") and rename the new directory "hse-new" into "hse". ____________ 9. Debugging If you think the CGI application doesn't run properly you can run it in "Debug mode" which may help you to find bugs. Start the application by typing the following URL into your browser's input field: http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?debugmode (or equivalent). If any system messages (errors or warnings etc.) occur, they will be passed to the browser window. The Debug mode can be enhanced (to the "Enhanced Debug mode") via the start URL http://www.yourdomain.tld/cgi-bin/hse/HomepageSearchEngine.exe?debug (or equivalent). This may also help to find bad configurations. Note that this output is not available when the "debug_level" directive in your hse.ini file is set a value higher than the default one (0). ________________ 10. Known issues Currently, the following behaviour is known not to work properly under some circumstances. - On systems with little available memory, an Error 500 ("Server Error") may occur when clicking on a result link to a larger file, when the highlightmatches/gotofirstmatch feature is enabled (which is the default). If you cannot get or configure more available memory, change section (6.8) of your hse.ini file as mentioned below, in the given order, until you will succeed: (1) Disable the gotofirstmatch feature by setting 'results_href = highlightmatches' (2) Decrease the value of SIZE - for instance, set 'results_href = highlightmatches + maxsize:100' (3) Disable both the highlightmatches/gotofirstmatch features at all by setting 'results_href = none' - Non-ASCII characters (for instance, the German "Umlaute") will always be searched case-sensitive unless you set "utf8 = on" and specify the proper Encoding via the "encoding" directive in the hse.ini file. In addition, you must apply the indexed search method. - When HTML elements in searchable documents have attributes, its values may not contain the ">" character. Use the ">" entity instead. For instance,
must be replaced by
Otherwise, a search may find code fragments included in such elements. ___________ 11. To-Do's The following tasks are currently waiting on the To-Do-list to be done in the future. 11.1 Internationalisation ------------------------- (1) Translation of the language core file + Translation of line 61 from the following languages: Arabic, simplified Chinese, traditional Chinese, Czech, Danish, Finnish, Greek, Hungarian, Norwegian, Romanian, (Latin) Serbian, Swedish, Thai and Turkish + Translation of lines 70-73 from the following languages: simplified Chinese, traditional Chinese, Czech, Danish, Greek, Norwegian, Swedish, Thai and Turkish (2) Translation of the language help file + Translation of the rear part of line 5 from the following languages: simplified Chinese, traditional Chinese, Czech, Danish, Greek, Norwegian, Swedish and Turkish + Translation of lines 32-33 from the following languages: Arabic, simplified Chinese, traditional Chinese, Czech, Danish, Finnish, Greek, Hungarian, Norwegian, Romanian, Swedish and Turkish + All for (Latin) Serbian + Hebrew translation (everybody is welcome to create it - as always, you will get a full version for free) + Translations into all other not yet supported languages are always welcome, too 11.2 New features ----------------- + Statistical options + Index files stored in UTF-8 format + Option to limit the search to files not older than a specified time + Search again in results + Web interface to configuration + MySQL backend for HSE's index + CD-ROM version for Windows ___________ 12. Support If you have problems or suggestions, please first check the Frequently Asked Questions at http://www.HomepageSearchEngine.com/faq_en.phtml Be sure to use the latest version available if you run into a problem or if you are looking for improved functionality. The latest version may already have solved your problem or/and implemented your requested feature. If your problem still could not be solved, don't hesitate to contact us using our feedback form at http://www.homepagesearchengine.com/feedback_en.phtml#2 If your server runs on a Unix platform, please be sure to install platform.cgi first and include the *full* URL to it in your message. Thank you. ___________ 13. Credits Thanks go out to (second names in alphabetical order): + Geir Juul Aslaugberg for his translation into Norwegian + Peter Bickel (www.polarpixel.de) for creating the HSE logo + Rémy Bieber for his translation into French + Raphael Boos for creating the shortcut icons for the HSE website + "David" Chang Shih Chun for his translation into Japanese + Ricardo Contreras for his translation into Spanish + Miguel Duclós for his translation into Portuguese + Emad Felemban from the Umm Al-Qura University (www.uqu.edu.sa) for his Arabic translation of the language core file + Abdullah Ghaze Fitaihi for his Arabic translation of the language help file + Nicola Gatta for her translation into Italian + Jozsef Tamas Herczeg for his translation into Hungarian + Mats Ingelström for his translation into Swedish + V.J. Janak for his translation into Russian + Pryme Sinista Jinx (www.linuxtr.com) for his translation into Turkish + Elena N. Kharlamova for her update of the Russian translation + Yannis Kotsis for his translation into Greek + Olivier Michenaud (www.dixiedisques.com) for his update of the French translation + Fabian Milos for his translation into Czech + Sanja Nesic for her translation into (Latin) Serbian + Krzysztof Palka for his translation into Polish + Xuguang Pan for his translation into simplified Chinese + Fragiskos Remoundos for his Greek translation of the language help file + Adrian Roye for his translation into Romanian + Hari Sersic for his translation into Croatian + Kimura Shinsuke for his update of the Japanese translation + Frans Storr-Hansen for his translation into Danish + Ylikorkala Tapio for his translation into Finnish + Itamar Vieira for his update of the Portuguese translation + Wojciech Nowakowski for his update of the Polish translation + Teerachai "Tee" Yongchaitrakul for her translation into Thai + Sander van Yperen for his Dutch translation of the language core file + Shi Jian Zhuang for his translation into traditional Chinese + Jan Zonjee for his Dutch translation of the language help file + all the people out there for providing us with continued feedback and support _____________________________________________ 14. History of version changes ("change log") Please refer to the file "history.txt". _____________________ 15. License agreement Please refer to the file "license.txt".