Search Engines

From CollectiveAccess Documentation
Jump to: navigation, search

[Valid for v1.3]

Search Engines

Providence features a modular search facility that allows you to choose from several low-level search engines. To support a given search engine you must write a Providence plug-in for it. The plug-in is composed of two classes implementing the IWLPlugSearchEngine and IWLPlugSearchEngineResult interfaces (defined in app/lib/core/Plugins). Each class implments "glue" between Providence and the given search engine. The IWLPlugSearchEngine implementor handles indexing and searches, taking tokens to index and searches to execute, and in the case of executed searches returning the IWLPlugSearchEngineResult implementor.

Query syntax

No matter the back-end search engine, your plug-in is expected to implement the Lucene query syntax. Parsing of this syntax is provided by the Zend PHP implementation of Lucene located in app/lib/core/Zend/Zend_Search.

Available plug-ins

Since all of the available open-source search engines have some disadvantage we want to provide as many options for Providence as possible. The following search engine plugins are in development or planned. Feel free to add your own here!


SQL-based "SQLSearch"

SQLSearch is an engine that employs regular MySQL tables to create an inverted index stored. This technique was used in the 0.5x version of CollectiveAccess and provided reasonable performance and scalability combined with easy deployment (zero-configuration is required). For version 0.6 and 1.0 alternative engines were explored that leveraged existing code (eg. PHP Lucene, MySQL FULLTEXT, SOLR, etc.). While ultimately workable, none of the other options combine the deployment and performance characteristics of the inverted index approach. Thus, a new Unicode-friendly "SQL Search" plugin has been implemented, as of version 1.1, as an alternative to PHP Lucene and MySQL Fulltext, the other "easy deploy" options. As of version 1.1, SqlSearch is the default search engine option and as of version 1.3 the only supported "easy deploy" option.

Pros: Performance and scalability are generally good; deployment is effortless

Cons: Indexing can be slow; disk space requirements for indices can be large

Status: Implemented


ElasticSearch

ElasticSearch is a simple, fast and increasingly popular search engine.

Pros: Performance and scalability are very good

Cons: Requires you to run an ElasticSearch installation, which means running a Java application stack. This is often not an option for installations with limited IT resources.

Status: Implemented. Notes on usage and implementation here


Which one do I use?

For new installations use the default SqlSearch engine. It requires no special setup or configuration and can handle significant volumes of data. As your database grows you may elect to deploy ElasticSearch as it often provides significantly better performance than any other available engine. ElasticSearch does require a bit expertise to set up, however, and may be impractical to run on shared servers.

What about the others?

  1. The MySQL FULLTEXT engine is still usable for 1.2 and earlier installations, if you don't want to change to SqlSearch, you don't have to. SqlSearch is basically an improved version of FULLTEXT; you should notice little if any disruption in the change.

sphinx_moved

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools