Information Services

From CollectiveAccess Documentation
Jump to: navigation, search

InformationService is a metadata element type that allows referencing external web services as metadata attached to CollectiveAccess records. It does this by performing a lookup operation at the remote service and then allowing you to pick a value from a result list:

InformationServiceLookup.png

It then stores some core information about the referenced piece of data and a reference (URI) to the original resource.

InformationService is also a plugin API that makes it easy to add support for other external services. The exact information stored locally differs from plugin to plugin.

Configuration

A basic configuration in a Installation_profile could look like this.

    <metadataElement code="my_element" datatype="InformationService">
      <labels>
        <label locale="en_US">
          <name>My InformationService Element</name>
        </label>
      </labels>
      <settings>
        <setting name="service"><!-- enter service here --></setting>
      </settings>
      <typeRestrictions>
        <restriction code="r1">
          <table>ca_objects</table>
          <settings>
            <setting name="minAttributesPerRow">0</setting>
            <setting name="maxAttributesPerRow">255</setting>
            <setting name="minimumAttributeBundlesToDisplay">1</setting>
          </settings>
        </restriction>
      </typeRestrictions>
    </metadataElement>

Note that the service setting is mandatory and defines the plugin used for that element. A list of available plugins is below.

Available plugins

Below is a list of existing plugins and available settings.

CollectiveAccess

This plugin allows you to reference records in remote CollectiveAccess instances. Available settings are as follows:

Setting name Description Example
service Set service setting to 'CollectiveAccess' to use this plugin CollectiveAccess
baseURL URL used to query the information service http://localhost/admin/
table valid CollectiveAccess table name ca_entities
user_name User name to authenticate with on remote system webservice
password Password to authenticate with on remote system /
labelFormat Display template to format query result labels with. ^ca_entities.preferred_labels
detailFormat Display template to format detailed information blocks with. ^ca_objects.preferred_labels (^ca_objects.idno)

uBio

uBio is an initiative within the science library community to join international efforts to create and utilize a comprehensive and collaborative catalog of known names of all living (and once-living) organisms. Available settings for this implementation are:

Setting name Description Example
service Set service setting to 'uBio' to use this plugin uBio
keyCode uBio key code. See http://www.ubio.org/index.php?pagename=xml_services for details. Default is the ubio_keycode setting in app.conf a1b2c3

Getty Linked Open Data Services

The Getty LOD Services are technically 3 different plugins that share a common code base. They allow referencing concepts in Getty's AAT, TGN and ULAN vocabularies via their SPARQL Linked Open Data web service. Set the service setting to AAT, TGN or ULAN to use corresponding services. The plugin uses Getty's SPARQL endpoint and their full text indexes for fast lookups and the full RDF representation (example here) of the concepts to display more detailed info and also to make additional data available for search.

None of the 3 plugins has any custom settings on element level, but they share a more comprehensive configuration in the configuration file linked_data.conf. The default configuration should work for most use cases. The file has 3 large blocks, one for each of the plugins (tgn, aat, ulan). Their format is identical and consists of 3 settings:

Setting name Description Example
search_text If set to 0 we use the luc:term field for searching, which only contains the terms/labels. If set to 1 we use the luc:text field instead, which can yield a lot more but erratic results. See http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query 1
detail_view_info List of attributes to show in the extended information panel. Info has to be in literal form, but can be pulled through related nodes (see below). Note that full uris for both resources and literals have to be wrapped in < and >. See also# http://www.easyrdf.org/docs/property-paths (this is the library we use to traverse the graph). Available settings are:

The label setting defines the label used for this field in the extended info panel. uri is an optional setting that allows pulling information through related RDF nodes. literal is a setting that should resolve to a RDF literal and defines the actual text that is pulled in for display. limit limits the number of related nodes that are processed. Crawling the RDF graph can get very slow for a large number of nodes. stripAfterLastComma lets you strip everything after (and including) the last comma in the individual literal string. This is useful for gvp:parentString where the top-most category is usually not very useful. invert is a setting handcrafted for gvp:parentString and inverts the hierarchy path so that it starts with the most generic node.

Note that this data is only visible when you scroll down the extended info panel. It is not available for search or in bundle displays! See below for how to add data for search.

type = {
	label = Type,
	# use uri if you want to pull from a related node
	uri = <http://vocab.getty.edu/ontology#placeTypePreferred>,
	literal = <http://www.w3.org/2004/02/skos/core#prefLabel>,
	limit = 1,
},

or

parentString = {
	label = Full path,
	literal = <http://vocab.getty.edu/ontology#parentString>,
	stripAfterLastComma = 1,
	invert = 1,
},
additional_indexing_info List of attributes to add to the search index (in addition to the display value). This allows you to use non-display information from the Getty services for search purposes. For instance, you might not want to display the full gvp:parentString for each related AAT keyword but you still want to search for the broader categories.

Note that the syntax is virtually identical to the detail_view_info setting above, except for the absence of the label setting.

altLabels = {
	uri = <http://www.w3.org/2008/05/skos-xl#altLabel>,
	literal = <http://vocab.getty.edu/ontology#term>
}

Wikipedia

This service allows referencing Wikipedia articles. Available settings are

Setting name Description Example
service Set service setting to 'Wikipedia' to use this plugin Wikipedia
lang 2- or 3-letter language code for Wikipedia to use. Defaults to "en". See http://meta.wikimedia.org/wiki/List_of_Wikipedias en


This plugin also tries to pull in an abstract and a preview image for local display. Both the abstract and preview image are available in bundle displays. Suppose your wikipedia metadata element has the code wikipedia. You can reference additional properties about a referenced article like this:

ca_objects.wikipedia.<property>

Where property is one of the following:

Setting name Description
image_thumbnail Image thumbnail URL
image_thumbnail_width Width of image thumbnail. Box is capped at 200px by 200px.
image_thumbnail_height Height of image thumbnail. Box is capped at 200px by 200px.
image_viewer_url (Valid for v1.5.1) URL for Wikipedia's full screen image viewer. Example here.
title Title of the Wikipedia article
pageid Numeric page identifier
fullurl URL for the article
canonicalurl Canonical URL for the article
extract Extract of the article. This is usually a HTML representation of the full article!
abstract CollectiveAccess tries to extract the first paragraph from the full article representation above to provide a shorter abstract. This is usually the part of the article shown above the table of contents but the extraction might fail for poorly formatted articles.

Implementing new plugins

InformationService implementations reside in app/lib/core/Plugins/InformationService and should implement IWLPlugInformationService and extend BaseInformationServicePlugin. The class name must be "WLPlugInformationService<Service>" and the file name "<Service>.php".

It can provide additional settings using the static $s_settings variable, usually derived from $g_information_service_settings_<Service>. It should set the "NAME" property of the info array in the constructor.

The Wikipedia implementation is relatively simple and uses most of the available features (except getDataForSearchIndexing()) so you could use that as a template.

Core functions

The core functions you must implement are:

public function lookup($pa_settings, $ps_search, $pa_options=null);

where $pa_settings is an array containing the settings for this particular element (including the ones you provided) and $ps_search is the search expression provided by the user. The function should return an array with the "results" key being a list results for the given search expression. Each result should have a label, url and idno.

public function getExtendedInformation($pa_settings, $ps_url);

This should return an array with the "display" key set to an HTML representation of the given record (identified by the URL/URI). You can either go and look the detailed data up remotely or, for instance, call getExtraInfo() to get locally stored data (see below).

Optional functions

The functions listed below are optional and have default (empty) implementations in BaseInformationServicePlugin so it doesn't hurt to leave them out of your plugin entirely. They can be used to provide useful features though.

public function getExtraInfo($pa_settings, $ps_url);

Returns an array of key=>value pairs containing extra information to be stored locally, alongside the id, the display label and the URL. This data can be accessed using SearchResult::get(), so you should keep the keys alphanumeric, lowercase and without spaces.

public function getDataForSearchIndexing($pa_settings, $ps_url);

Returns a list of strings that are added to the search index for the record associated with this attribute. This allows you to add additional data points that can be used to find the CollectiveAccess record but are not necessarily available for display. Note that the data returnd by getExtraInfo() is not indexed for search, so you might have to add the same data twice.

public function getDisplayValueFromLookupText($ps_text);

The default behavior is to use the (selected) label returned by the lookup() function as display value for attribute values. That can be undesirable for use cases like the AAT where one the one hand you want a lot of identifying information in the lookup dropdown but on the other you probably don't care about all that info once the "relationship" has been created because the keyword is doing its job in the background (making the associated record findable). Maybe you just want a simple and short label instead to save space.

This function allows you to mangle the lookup text to create a different display value. The lookup text usually has the URL in it, so you could even look up additional info to pull in here if you wanted. An example can be found in the AAT implementation, where we do some regular expression magic to convert lookup texts:

before: [300025342] swordsmiths [people in crafts and trades by product, people in crafts and trades]
after: swordsmiths

sphinx

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools