API:Metadata Elements and Attributes

From CollectiveAccess Documentation
Revision as of 20:11, 23 October 2012 by Jonathan (talk | contribs) (Created page with "== Configurable metadata attribute system == Most collections management systems employ a static schema for structuring stored data. Typically these schemas are designed by ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Configurable metadata attribute system

Most collections management systems employ a static schema for structuring stored data. Typically these schemas are designed by first establishing functional requirements for the system through careful analysis of the problem domain (museum collections, archival collections, natural history collections, etc.). From the requirements a set of key "entities" can be derived (eg. "collection objects", "loans", "donors", "collections", "exhibitions"). Next a discrete set of data to be collected for each type of entity is defined and inter-relationships between entities established, also based upon functional requirements. (More information on data modeling techniques can be found at http://en.wikipedia.org/wiki/Data_modeling and many other places on the Internet). For systems based upon relational databases, as CA is, "entities" manifest as database tables, the set of collected data as fields in tables and the relationships between entities as primary key-foreign key relations between tables.

Static schemas, when designed well, can be efficient and expressive enough for most applications. However, as the word static implies, the structure is fixed and relatively inflexible. The schema for previous versions of CA was a static schema that defined core entities (collection objects, people [called entities in the schema - confusing in the context of this discussion, we know], geographic places, occurrences, collections, and controlled vocabularies) and a set of what we considered at the time to be standard fields for each.

A simple extension mechanism was provided in the form of type-dependent "attributes." This allowed one to define attribute-fields (called "metadata elements") with a specified datatype (text, number or date range) and designate them for potential application to any of the core entities, optionally restricted by an entity-specific type. Attributes were optional and repeatable - you could assign 100 attribute values or none.

In practical terms what this allowed you to do is define sets of simple one-value fields (eg. some text, a number or a date range) that would appear based upon the entity and entity-specific type, were optional and could repeat. You could have one set of attribute fields for objects that were video tapes and another for objects that were sculptures. Similarly, you could have different sets of attribute fields for people ("entities") that were individuals and another for those that were actually organizations.

One additional mechanism was provided to users wishing to extend or customize the old schema: relabeling of existing fields. The meaning of any existing field in the schema could be redefined (or "abused" in the words of one of our users) by simply changing its name in the user interface. As lame as this sounds (and indeed it is lame) this is a primary customization path - along with custom development - offered by most existing collections database systems.

Our experience supporting previous versions of CollectiveAccess has taught us a number of practical lessons about the limitations of static schema for museum collections management:

  • There really are no "standard" fields. There are standards and there are people that want to believe that everyone should use one (usually theirs). But the reality is that system requirements are driven by institutional realities that not only can be at odds with any given standard, but more often than not are divergent in significant and unavoidable ways.
  • There are more types of collections than you think. Not every collection is about objects, and not every institution is looking to do "traditional" collections management, so extensibility is important not just in terms of object records, but everywhere.
  • Simple fields are not enough. Offering the possibility of defining single-value fields works for many use-cases. The most common request is "can we have some more text fields?" But there are also a lot of practical cases where the ability to define fields that are actually structured sets of many values is critical. For instance, if you want to implement something similar to the PBCore metadata system (and more and more of our users want to do that) simple attributes and static schema aren't enough. (Unless the static schema is designed around PBCore, of course).

Improvements to the metadata attribute system in Providence

In Providence, metadata "attributes" are now the primary means of modeling fields for various types of data. All content fields are now defined as groups (or "sets") of one or more attributes attached to the core database entities, save for a selected group of "intrinsic" fields that hold non-repeating data common to any type of the entity (eg. entity identifier, entity-specific type setting), "labels" (what use to be referred to as "title" or "name" in the old schema) which are now stored and treated separately and certain types of specialized data like representations and object events which are stored in related tables.

Thus for any core entity (a collection object, for example) the only data fields that are always present are labels and the set of intrinsic fields (seven in total for objects, all of which are administrative). All other fields are configured on an as-needed basis for the installation.

If this sounds a little too squishy for your liking keep in mind that the schema still defines the set of core entities (which has been expanded somewhat), the basic fields defining those entities and the relationship paths between them (which have also been expanded). Flexibility comes from the new ability to "hang" a custom set of fields onto the core entities, to key the set of fields to specific entity-specific types and to construct fields that are actually "sets" of several discrete values. It also makes it possible to easily and uniformly support repeating fields when needed, and to cleanly support multilingual data without resorting to many hardcoded schema structures.

How the metadata attribute system in Providence works

The characteristics of the configured attribute-based fields in your system, called metadata element sets or simply element sets, are defined by rows in the ca_metadata_elements table. Each row represents a single primitive metadata field (or element) that can contain a single value. The table is hierarchical allowing one to define attribute-based fields that are tree-structured elements sets composed of multiple elements. No matter how many elements the element set is composed of the element_id "of record" that is applied to an entity is always that of the element row at the root of the hierarchy - you cannot use a branch of an element set as a metadata element.

The fields of the ca_metadata_elements table are defined as follows:

Field Description Default
element_id Unique integer identifier of element row Automatically assigned by the database
parent_id The element_id of the parent row for an element; is NULL for the root of the hierarchy NULL
list_id A reference to a value list as defined in the ca_lists table to use in the user interface; note that for lists a reference to the selected list item in ca_list_items is stored as well as the text of the item, so if the list item entry changes so will the attribute value NULL
element_code A unique alphanumeric code, up to thirty characters in length, used to identify and refer to the element; this is used as an easier to work with alternative to element_id none - must be specified and must be unique across all defined elements
documentation_url An absolute URL referring to documentation for the element, typically usage guidelines and notes <blank> - optional value
datatype Integer indicating the data type of the element; supported values are defined in the [wiki:attribute_types.conf] configuration file none - must be specified
settings A serialized associative array of rules for validation of values; the exact content of the array is dependent upon the datatype and is encoded and decoded by the ca_metadata_elements class none
rank An arbitrary integer value used to order elements with the same parent; provides a means for specifying the order in which elements in a complex element set should be displayed 0
hier_left and hier_right Used for hierarchical indexing. The fields contain the left and right-hand extent of the element in the nested set representation of the hierarchy. set by the BaseModel libraries on insert(), update() and delete()

Elements have display names, or labels, which are stored in the ca_metadata_element_labels table. A label can be defined for each locale in the system but need not be. If a locale is set for which a label is not defined then the default locale will be used. Element labels are used in the user interface to label individual components of an element set.

Element sets are bound to one or more of the core database entities (and optionally an entity-specific type) by entries in the ca_metadata_type_restrictions table. The fields are defined as:

Field Description Default
restriction_id Unique integer identifier for the restriction Automatically assigned by the database
table_num The table to which the element is bound to; table_nums are defined in the "tables" section of the datamodel.conf file none - must be specified
type_id The type_id of the entity-specific type the element in bound to; is NULL is the element is bound to all types, otherwise refers to the primary key of the entity's type list (stored in ca_lists and ca_list_items; the type_id refers to a row in the ca_list_items table) NULL
element_id Reference to the root element in a set in ca_metadata_elements to which the restriction applies None - must be specified
settings A serialized associative array of rules for validation of values; the exact content of the array is dependent upon the datatype and is encoded and decoded by the ca_metadata_type_restrictions class none
rank An arbitrary integer value used to order elements; provides a means for specifying the order in which elements in a list should be displayed 0

An element set definition is used to generate an on-screen data entry form fragment and to decode incoming form data. Or to put it another way, element sets are used as templates to instantiate attribute bundles - collections of user-entered values structured according to the element set from which they were minted.

Configuring models to support attributes

To support attributes, the model class for the database table must extend either BaseModelWithAttributes in /app/lib/core/ or LabelableBaseModelWithAttributes in /app/lib/ca/ (if you also want your entity to support labels).

Implementing attribute types

[To come]


Personal tools