@echo off && setlocal rem HomepageSearchEngine cronjob script for Windows (last updated on 2006-11-17) rem All lines beginning with "rem " are comments. rem This batch script can be used to: rem (1) optionally convert all supported PDF files under your home directory into plain text format rem (2) optionally spider one or more entire sites and create the URL-list required to grab URLs (only required if you will perform step 3) rem (3) optionally grab the content of remote URLs to your site rem (4) make the file-list required for the flat or the indexed search method rem (5) index your site rem (6) optionally rename the URLs in the index back to their original ones (only required if you have performed step 3) rem (7) optionally do some additional tasks rem Details can be found in chapter 7.1 ("Spidering and URL Grabbing: Searching of any sites") of the Manual (ReadMe.txt) rem rem This script is an example for having 3 categories: rem As category 1, we want to search the English site "www.site1.tld". This could be our own website, probably containing dynamical content. rem As category 2, we want to search the German site "www.site2.tld". This could be another company site, hosted elsewhere. rem As third category, we want to search both the above sites at once. rem Since these sites are encoded in the same Character Encoding (iso-8859-1), we can use the same configuration set for both. rem rem Make sure your "hse.ini" file contains something like this: rem rem basepath = D:\Inetpub\wwwroot rem rem categories_nr = 3 rem rem categories_name1 = www.site1.tld in English rem categories_name2 = www.site2.tld in German rem categories_name3 = all the above sites rem rem categories_dir1 = hse/_sites/en/www.site1.tld rem categories_dir2 = hse/_sites/de/www.site2.tld rem categories_dir3 = rem rem categories_source1 = rem categories_source2 = rem categories_source3 = time /t echo Starting building search index echo. rem Edit the following path to point to your hse directory! pushd C:\Inetpub\wwwroot\cgi-bin\hse rem (1) determining unsupported PDF files and converting supported PDF files into plain text format echo. echo Now performing step 1: converting PDFs echo. rem Ensure DIR is set to your home directory (or the directory PDF files you want to make searchable are residing under) set DIR=C:\Inetpub\wwwroot HomepageSearchEngine pdfconvert -dir=%DIR% 2> %DIR%/unsupported_pdfs.txt rem (2) automatically generate the URL-list files from all sites echo. echo Now performing step 2: spidering echo. rem Ensure each SITE value is set properly. Then, only the "-cat" and "-lang" options have to be checked: set SITE=www.site1.tld HomepageSearchEngine spider -cat=1 -lang=en -pdf2txt -url=http://%SITE%/ -nobackup -batchmode set SITE=www.site2.tld HomepageSearchEngine spider -cat=2 -lang=de -pdf2txt -url=http://%SITE%/ -nobackup -batchmode rem (3) grab the content of the URLs listed in the URL-list files echo. echo Now performing step 3: grabbing the URLs' contents echo. HomepageSearchEngine geturls -cat=1 -lang=en -nobackup -batchmode HomepageSearchEngine geturls -cat=2 -lang=de -nobackup -batchmode rem (4) make the file-list file pairs echo. echo Now performing step 4: making the file-list echo. HomepageSearchEngine makelist -cat=1 -nobackup -batchmode HomepageSearchEngine makelist -cat=2 -nobackup -batchmode rem (5) create the index file pairs echo. echo Now performing step 5: indexing echo. HomepageSearchEngine index -cat=1 -nobackup -nocheck -batchmode HomepageSearchEngine index -cat=2 -nobackup -nocheck -batchmode rem (6) change the URLs in the indexes back to their original ones echo. echo Now performing step 6: changing the URLs' names echo. HomepageSearchEngine changeurls -cat=1 -nobackup -batchmode HomepageSearchEngine changeurls -cat=2 -nobackup -batchmode rem (7) do some additional tasks echo. echo Now performing step 7: doing some additional tasks echo. rem Provide the index for category 3 by merging the indexes from category 1 and 2. rem First, clean up category 3's index: del /Q hse_index3_*.txt copy /Y hse_index1_html.txt + hse_index2_html.txt hse_index3_html.txt copy /Y hse_index1_nonhtml.txt + hse_index2_nonhtml.txt hse_index3_nonhtml.txt rem Provide category 3's index also as main index (used if HomepageSearchEngine is called without a "cat" delivery parameter) - rem So make a copy of it (commented out on Windows since that may require significantly more disc space): rem rem copy /Y hse_index3_html.txt hse_index_html.txt rem copy /Y hse_index3_nonhtml.txt hse_index_nonhtml.txt # Finally, we change the current directory back to the original one: popd echo. time /t echo Finished building search index