Difference between revisions of "Search Indexing Configuration"

From CollectiveAccess Documentation
Jump to: navigation, search
(Organization)
(Organization)
Line 3: Line 3:
 
=== Organization ===
 
=== Organization ===
  
The file is divided into sections for each [[Primary_Types|item type]] to be indexed. The key for each section is simply the table name. Within each section are sub-sections for item fields as well as for related items. Content in related records may be indexed against the item. For example, you may have an object record indexed by its various fields (accession number, condition, appraised value) as well as by content in related entities (name of artist, nationality of artist), places (place of manufacture), location, and more. Indexing for each item type is configured independently. You may have objects with indexing taken from related entities, while omitting related object data from entity indexing, for instance.
+
The file is divided into sections for each [[Primary_Types|item type]] to be indexed. The key for each section is the table name. Within each section are sub-sections for item fields as well as for related items. Content in related records may be indexed against the item. For example, you may have an object record indexed by its various fields (accession number, condition, appraised value) as well as by content in related entities (name of artist, nationality of artist), places (place of manufacture), location, and more. Indexing for each item type is configured independently. You may have objects with indexing taken from related entities, while omitting related object data from entity indexing, for instance.
  
 
=== Item sub-sections ===
 
=== Item sub-sections ===

Revision as of 12:30, 17 July 2016

The search_indexing.conf file controls which data in your CollectiveAccess database is searchable, and how. Only data elements configured in search_indexing.conf are searchable. Note that configuration of CollectiveAccess' browse system is completely independent from search. It is possible to search on data that are not browse-able, and browse on elements that are not indexed for search. See this page for details about configuring the browse.

Organization

The file is divided into sections for each item type to be indexed. The key for each section is the table name. Within each section are sub-sections for item fields as well as for related items. Content in related records may be indexed against the item. For example, you may have an object record indexed by its various fields (accession number, condition, appraised value) as well as by content in related entities (name of artist, nationality of artist), places (place of manufacture), location, and more. Indexing for each item type is configured independently. You may have objects with indexing taken from related entities, while omitting related object data from entity indexing, for instance.

Item sub-sections

Within a section for a given item type are several sub-sections:

Fields

The next section of the configuration determines what fields are indexed for search and the option(s) each field carries. By default the configuration indexes every custom element created in the system by a user (via the "special field" _metadata) as well as a list of fields "baked into" the database such as type (type_id), access, status, etc. An element must be defined as a field in this section of the configuration (either via a "special field," by default or by a user) in order for it to be indexed. User-defined fields would only be necessary if the _metadata field wasn't used or if indexing an intrinsic bundle (not indexed by default) was desired.

Special fields

In addition to the default and any user-defined fields there are several "special fields." Special fields always start with underscore character.

Option Description
_metadata Forces indexing of all attributes created in the system by a user.
_count Embeds the number of related rows for a given table in the index. You can specify this for both relationship and primary tables. The field is named <table_name>.count - for example: object_representations.count for table 'object_representations'. This can be used to find rows that have, or don't have, related rows in a given table.

When specified on a primary table (eg. ca_entities, ca_occurrences), counts are indexed in aggregate as well as for each type. For relationship tables (eg. ca_objects_x_entities) counts are indexed in aggregate as well as for each relationship type. For example querying on a specific type or types: ca_entities.count/individual:3 (finds records with exactly three related entities of type "individual") ca_objects_x_entities.count/artist:[2 to 4] (finds objects with between two and four entities related as artist)

Field-level options

A variety of options are available for defined fields.

Option Description Example syntax
STORE Forces the value to be stored in the index, if possible; this can speed display of the content in a search but may slow down indexing and increases index size not applicable
DONT_TOKENIZE Indexes the value as-is, rather than breaking into separate values on whitespace characters, such as a spaces or line breaks, or by punctuation characters. not applicable
DONT_INCLUDE_IN_SEARCH_FORM As described above, causes the field to not be includable in user-defined search forms. not applicable
BOOST A numeric "boost" value for the index field. Higher values will cause search hits on the boosted field to count for more when sorting by relevance. BOOST = 100
INDEX_AS_IDNO Causes the value to be indexed with various permutations for flexible retrieval as a record identifier. For example, if this option is used then a search for KA1 would return KA.0001. not applicable
INDEX_ANCESTORS Enables hierarchical indexing for field, assuming it is in an hierarchical table, resulting in all values for this field in records above the subject in the hierarchy being indexing against the subject not applicable
INDEX_ANCESTORS_START_AT_LEVEL Forces hierarchical indexing to start X levels down from the root. This allows you to omit the very highest, and least selective, levels of the hierarchy when indexing. If omitted indexing starts from the hierarchy root INDEX_ANCESTORS_START_AT_LEVEL = 2
INDEX_ANCESTORS_MAX_NUMBER_OF_LEVELS Sets the maximum number of levels above the subject to be indexed. If omitted all levels of the hierarchy above the subject are indexed INDEX_ANCESTORS_MAX_NUMBER_OF_LEVELS = 3
INDEX_ANCESTORS_AS_PATH_WITH_DELIMITER Sets a delimiter to place between each level of the hierarchy prior to indexing the entire hierarchy path above the subject. This is useful when you want to treat the hierarchy path as an identifier INDEX_ANCESTORS_AS_PATH_WITH_DELIMITER = .

Here's an example of a field, idno, that uses multiple options:

ca_objects = {
		fields = {

			idno = { STORE, DONT_TOKENIZE, INDEX_AS_IDNO, BOOST = 100 },

Access Points

The access points sub-section (use key _access_points) defines aliases for specific indexed elements or groups of elements. It also allows a user to set attributes to be used in search forms as well as search shortcuts.

Search Shortcuts

With _access_points you can create shortcuts to be used in any search system-wide, including Basic Search, Quick Search, Find in the Hierarchy bundle, and Advanced Search.

Let's say you want to create a search shortcut for a "Materials" element on your object record. In the Access points sub-section of the objects section of your configuration file:

ca_objects = {
	# ------------------------------------
	_access_points = {

you would add the "Materials" access_point. Whatever you want the shortcut to be (let's say "mat") should be included on the left side of the equals sign:

ca_objects = {
	# ------------------------------------
	_access_points = {
		mat = {
			fields = [ca_objects.material],
			options = { DONT_INCLUDE_IN_SEARCH_FORM }
		},

Within the square brackets to the right of the fields equals sign, the attribute's elementCode is used (following a period and the CA table name).

Now you can quickly search for materials anywhere in your system using the syntax:

mat:stone

It is also possible to create shortcuts that bundle several elements together. A search on the access point will search all of the included fields at the same time. Each attribute should be comma separated:

style = {
	fields = [ca_objects.material, ca_objects.medium, ca_objects.technique],

Remember that if you want to search for multiple words within your single access point, quotation marks should enclose the whole string:

style:"stone sculpture"

A search for simply:

style:stone sculpture

would mean search for stone in the Materials, Medium & Technique fields AND sculpture anywhere else. That would mostly likely also return effective (but different) search results. Similarly, there shouldn't be a space between the colon and the search term (i.e. style: stone) because the search will "break" on the space and the search preformed will be a universal query for stone.

If your target element for a search shortcut is a container, make sure to include the full path of ca_table.elementCode.subElementTarget or:

			fields = [ca_objects.description.description_source],	

Search forms

You may have noticed that in the code examples above an option was used:

options = { DONT_INCLUDE_IN_SEARCH_FORM }

This is because by default each defined metadata element will be pulled into the available elements for building search forms. Including your shortcut a second time would be redundant. However, if you're adding an access point that isn't already included (say, "filename" which until recently wasn't indexed by default but was stored in the database) you would define it here and remove the DONT_INCLUDE_IN_SEARCH_FORM option.

Note that all fields included in an access point must be included in the search index - they must appear in the fields list in other words. All indexed fields automatically have access points created in the format tablename.fieldname (ex. objects.title); indexed metadata also have access points in the format tablename.md_<element_id> (ex. objects.md_5)

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools