KSearch v1.4 - Download it now!

FAQs

More Help can be found in our Discussion Forum Back to Main Page
General Installation and Usage Troubleshooting Common Problems and Returned Errors

General
  • What is KSearch?

    KSearch is a fast, efficient, and highly configurable search engine and web site indexer. It includes almost all features available in the most popular commercial search engines yet is completely free. Key features include phrase searching, case sensitivity, whole word searches, + and - logical operators, searching within previous results, a completely customizable HTML template and much more. Back to top

  • How does KSearch work?

    The KSearch indexer starts at a specified directory and crawls through all of its files and subdirectories except for those listed in a special ignore-files list. For each file, the indexer removes HTML tags and embedded scripts and styles, translates Latin-1 characters to english equivalents, and then indexes the remaining contents. The title, description, links, meta-description, meta-keywords, meta-author, modification date, and file path are saved using a DBM database or flat file. To increase the search speed, the contents of each file can also be saved in the database or in a flat-file. The search script finds the number of query terms in each document, ignoring terms in a special stop-terms file, by directly accessing the file information stored in the DBM database or flat-file, or by searching on-the-fly. The results are sorted according to the users preference and are then returned to the user via a customized HTML template. Back to top

  • What do I need to use KSearch?

    KSearch can be installed on Unix or Windows platforms. Perl (5.003 or higher) must be installed on your system to run both the indexer and search scripts. The search script uses the Benchmark and CGI perl modules which are included in the standard Perl distribution. KSearch uses the best perl DBM library module available on your system (SDBM_File is included in the standard Perl distribution. DB_File or GDBM_File may be required for very large sites). If you want to index PDF files, you must install Xpdf from http://www.foolabs.com/xpdf/; the KSearch indexer uses pdftotext in Xpdf to convert PDF files to text content. The disk space needed depends on the users configuration and the size of the website. To get an idea of size requirements, the index for the 5M perl manual is 4.9M. KSearch can be configured to use a very compact index for minimal size requirements. Back to top


Installation and Usage Troubleshooting
  • How do I open the "ksearch.tar.gz" file?

    Uncompress the file first by typing "gunzip ksearch.tar.gz", and then type "tar xvf ksearch.tar" to extract the file. On Windows platforms, winzip (winzip.com) can open "tar.gz" archives. Back to top

  • How do I install KSearch?

    After you extract the compressed file, you can open the README file for detailed instructions to complete the installation.

    General Installaton Steps:

    1. Set the path to perl on the first line of "indexer.pl" and "ksearch.cgi".
    2. Set the appropriate file permissions.
    3. Edit the "configuration/configuration.pl" file.
    4. Edit the "configuration/ignore_files.txt" and "configuration/stop_terms.txt" files.
    5. Run the indexer by typing "indexer.pl". Some systems may require you to type "./indexer.pl".

    Back to top

  • How do I prevent certain files and directories from being indexed?

    To skip files and directories in the indexing routine, add their full paths in the "configuration/ignore_files.txt" file. Be sure that they are added on separate lines. Back to top

  • How do I stop the search engine from finding certain terms?

    Add terms you do not want indexed in the "configuration/stop_terms.txt" file. Be sure that they are added on separate lines. Back to top

  • How do I stop the search engine from finding common terms that may be in nearly all my documents but not in the "stop_terms.txt" list?

    Set $IGNORE_COMMON_TERMS to the maximum percentage of files that terms can exist in. This will add the common terms to 'stop_terms.txt'. For example: if set to 80, terms that are found in over 80% of all files will be added to the 'stop_terms.txt' file. Back to top

  • What is the full path?

    The full path is the absolute path of a file or directory. On Unix systems, you can determine the full path of the current directory by typing "pwd". We recommend asking your host service or system administrator for the full path to your web site for running CGI scripts. For Windows, you may need to include the drive letter, for example "C:\WINDOWS\kscripts\". Back to top

  • How do I set the Perl path in "indexer.pl" and "ksearch.cgi"?

    For Unix platforms, the first line of each script tells the computer that the file is a Perl program. On the first line you will see "#!/usr/bin/perl" which is a common location for Perl. You can type "which perl" to determine the Perl path on your system. If it is different, you will have to change the first line to the correct path keeping the "#!" in front. Back to top

  • How do I set file permissions?

    For Unix platforms, type "chmod 755 filename" for read/exec permissions, and "chmod 744 filename" for read only permissions (filename is the name of the file). For Windows (NT) users right-click the file or directory, click Properties on the shortcut menu, and then click Permissions on the Security tab. See the README file for a list of file permissions for each file. Back to top

  • When I run the indexer, I get the error message 'dbm store returned -1, errno 28, key "trap" at - line 3.' What does this mean?

    This means that your system does not have either the DB_File or GDBM_File module and that you have reached the (key/value) memory limit of the DBM database. You will have to install either the DB_File or GDBM_File module from CPAN (http://www.perl.com/CPAN-local). Most systems will have DB_File. back to top

  • I do not have DB_File. Is there an alternative?

    If you do not have DB_File, you can use a flat file database to prevent running into memory limits or database access problems. To do this, set $USE_DBM = 0 in configuration.pl. back to top

  • How do I add a search box on my site?

    To add a search box, insert the following HTML code:

    <FORM ACTION="http://my_web_site.com/cgi-bin/ksearch/ksearch.cgi" METHOD="GET" NAME="search">
    <INPUT TYPE="text" NAME="terms"><INPUT TYPE="submit" VALUE="Search">
    </FORM>
            

    Be sure to change "my_web_site.com" to your actual domain. Back to top

  • How do I configure the search engine for speed?

    Set $IGNORE_COMMON_TERMS to the maximum percentage of files that indexed terms can exist in. This will ignore common terms not present in 'stop_terms.txt'. Additionally, the search engine will be slightly faster if DB_file is used. See the README documentation for details. Back to top

  • How do I configure the search engine to save disk space?

    Set $SAVE_CONTENT to 0 to use a compact search index database. Set $MAKE_LOG to 0 so you do not create a logfile of the indexing routine. This configuration may cause unbearably slow search speeds. Back to top

  • My descriptions always show the same terms that are in my navigation bar. How do I show useful content in the descriptions?

    Descriptions may always show the same content if websites start with the same navigation bar. You can show more useful content by setting $DESCRIPTION_START to a number that will represent the term number to start the description at. The very first term is 0. For example: if you set $DESCRIPTION_START to 50, then the description will start from the 51st term in the file. Back to top

  • Why does the search seem to take longer than the indicated search time?

    The search time is the actual CPU time used to complete the search process. This time does not represent the actual "wallclock" time that takes into account CPU scheduling and other factors. It also does not include the time used by busy networks. The CPU time is a more accurate measure of the search engine itself. Back to top

  • What is the score?

    Unless weights are used, the score is simply the percentage of matching characters in the document (0.00 - 100.00). Back to top

  • How do I index PDF files?

    In order to index PDF files, you must install Xpdf from http://www.foolabs.com/xpdf/. The KSearch indexer uses pdftotext in Xpdf to convert PDF documents to text. Xpdf is freely available for Unix and Windows platforms under the GNU General Public License. Once you have installed Xpdf, set $PDF_TO_TEXT to the full path of the 'pdftotext' executable in the Xpdf package, and then index your web site as per usual. Note: Using this option produces a security risk since the indexer script must run pdftotext using a shell command. Do not use this option if you do not trust the content of the PDF files you want to index. Results for phrases may be less accurate due to the output format of pdftotext; columns of text are left in the conversion rather than continuous content. Back to top



Common Problems and Returned Errors
  • Cannot open ".../database/files: No such file or directory...?

    This error may occur if you do not have a database directory in the ksearch folder, if your server does not grant write access to CGI scripts, if your permissions are set incorrectly, or if the DBM module is not installed correctly. In order to run both the indexer and search script, your server must allow write access to CGI scripts. If your server grants write acces to CGI scripts and you still get this error, then try using a flat file database by setting $USE_DBM = 0 in configuration.pl. Back to top

  • When I run the indexer from the web, the script ends prematurely.

    Most servers give HTTP requests a time limit (time out) to prevent tying up the server's resources. If you are indexing a relatively large site from a web browser, you will probably reach this limit. The only way to get around this limit with the current version of KSearch is to ask your host service or system administrator to increase the time so the server does not time out. Back to top

  • When I try running the indexer or search script from the web, my browser returns the contents of the file instead of running the script.

    When running the indexer from the web, be sure to run indexer.cgi (not indexer.pl). Additionally, be sure your server is set to run Perl for .CGI scripts, particularly in the directory that KSearch is installed. Back to top


Back to top
Home | Services | Resources | Free Scripts | Support | Forum
KScripts.com © Copyright 2000, All rights reserved