ReadMe for the HomepageSearchEngine cronjob sample scripts ========================================================== last updated on 2006-11-17 The most powerful way to index your site would be if you let the index files to be created automatically every day. This could be done on Unix using the shell script "hse_cronjob.sh" (shell access is required) or on Windows using the the batch script "hse_cronjob.bat" found in this directory. Place the script file in a sub directory of your home directory, called "hse". Be sure to edit it first!! A detailed description of what to do is in the file itself. Before you setup the cron job, test the script by changing into its directory and executing it manually, by entering "./hse_cronjob.sh > hse_cronjob.log" (on Unix) or "hse_cronjob > hse_cronjob.log" (on Windows) on the command line. This creates the "hse_cronjob.log" Log file, which should be checked if everything went fine. Re-run the cronjob script each time after changing either the script itself or one of the configuration files used, and take a closer look into the resulting Log file. On Unix, the following command is the preferred one, especially if you are using the spider within the cronjob script: "date; time ./hse_cronjob.sh > hse_cronjob.log; date" This additionally prints the start time, the length of time the script required to finish, and the end time. Here are the details on how to setup the cronjob to automatically run the script on Unix or on Windows, respectively: On Unix: -------- The purpose of the shell script "hse_cronjob.sh" is to be executed as cron job. It works on all shells. Here are the instructions on how to install this cron job: (1) Login onto the shell and test if you are allowed to setup a cron job by entering the command crontab -l (1a) If you receive a message such like You are not allowed to use crontab you must ask your admin to either give you the right to edit the cron table or to do this for you. (1b) Otherwise, you have been lucky and can continue with step 2. (2) After you have edited and tested your "hse_cronjob.sh" file by executing it manually copy it in a "hse" sub directory of your your home directory (~) and make sure that it has execute permission for the owner (chmod 700 would be best). Take care that the file keeps residing there with its permissions. (3) Enter crontab -e to edit your cron table using the vi editor (usually). So change into the insert mode by pressing the "I" key, position your cursor to the end of the file and input the following as new line: 0 4 * * * ~/hse/hse_cronjob.sh This would execute your "hse_cronjob.sh" file every day at 4.00 o'clock in the morning. If you want your cronjob to be executed only once a week, let's say every Sunday, change the first 5 fields to "0 4 * * 0". Leave vi's insert mode by hitting the "ESC" key. Finally, press and hold the "SHIFT" key while hitting the "Z" key twice. This saves the new cron table and exits crontab. You should receive a message like "crontab: installing new crontab". (3a) Note that "hse_cronjob.sh" writes its output to STDOUT. This may result in sending out an eMail to the user the cron job has been running as, containing that output. To prevent your system sending such a mail and for making troubleshooting easier it is recommended to redirect the output of "hse_cronjob.sh" to a log file. To do this, first try to execute "~/hse/hse_cronjob.sh > ~/hse/hse_cronjob.log 2>&1". If this works, set your cron table entry 0 4 * * * ~/hse/hse_cronjob.sh > ~/hse/hse_cronjob.log 2>&1 instead of that mentioned above. This creates a "hse_cronjob.log" file which always holds the output of the last batch process, including error messages (output to STDERR) by your system if they occur. Make sure that you have write permissions to that file (chmod 600 would be good). If that command could not be executed, first change to the Bourne Shell by entering "sh". (4) On the next day, sometimes after 4 o'clock server time, you should check if the index process has been performed successfully by searching for "list:files" with the HomepageSearchEngine CGI application. You should see something like: The following files will be searched: Category 1: some category name (indexed search) hse_index1_html.txt (1767 KB; Updated on August 8, 2006, 4:02) On Windows: ----------- If you have direct access to a Windows 2003/2000/NT server machine you can use "Scheduled Tasks" from the Control Panel, pointing to the location of the "hse_cronjob.bat" batch script file. It is recommended to redirect its output (STDOUT as well as STDERR) to a log file. For example, use C:\path_to_your_home\hse\hse_cronjob.bat > hse_cronjob.log 2>&1 as entry in the Start Task box. This creates a "hse_cronjob.log" file which always holds the output of the last batch process. Support: ~~~~~~~~ If you need assistance with your cronjob script file, please execute it in the format "./hse_cronjob.sh > hse_cronjob.log 2>&1" (on Unix) or "hse_cronjob > hse_cronjob.log 2>&1" (on Windows) and send us your hse_cronjob script file, the "hse_cronjob.log" log file, your hse.ini and the URL-list .csv file when asking for support. Thank you.