Data Importer

From CollectiveAccess Documentation
Jump to: navigation, search
This page contains changes which are not marked for translation.

Other languages:English 100% • ‎French 100%

[Available for v1.4]

Contents

Overview

With the release of CollectiveAccess v1.4, users will be able to create their own mappings and migrate their source data directly from the command line or the Providence user interface (under "Import > Data") into a configuration of Providence. Running a data import involves seven basic steps, all of which are outlined in detail below.

1 - A user must create a mapping document that will serve as a crosswalk between the source data and the ultimate destination in CollectiveAccess.

2 - Before running an import, it is absolutely crucial to create a backup of the database in its current form by executing a data dump.

3 - When you have taken this pre-caution and your mapping document is complete it's time to run the import from either the command line or the graphical user interface.

4 - When the migration has run, check the data out in CollectiveAccess and look for errors or points of inconsistencies.

5 - Revise your mapping accordingly

6 - Load the data dump so that the system returns to its pre-import state (and thereby clears out the data that you are about to re-run.)

7 - Run the import again. That's it!

Supported input formats

Currently: XLSX, XLS, MYSQL, Filemaker XML, Inmagic XML, TEI XML, MARC, CollectiveAccess, and WorldCat (for migrations from one system to another).

Planned imminently: tab delimited, CSV, OAI-PMH/DC,

Creating a mapping

To create a mapping, first download the Excel-based import mapping template available here. (File:Data Import Mapping template.xlsx). A version with JSON validation is available on [Google Drive]. Once all of the mappings and settings have been entered into the template it can be loaded directly into CollectiveAccess (see Running a mapping below) and the page on Creating and Running a Mapping.

This mapping template operates under two basic assumption about your data. 1 - that each row in a data set corresponds to a single record, and 2 - that each column corresponds to a single metadata element.

Below we will go through each column of the mapping template. In the first column, you must set the Rule Type for each for each column of data in your data set. Refer to the table below for descriptions of each Rule Type.

Rule types

Rule type Description
Mapping Maps a data source column or table.field to a CollectiveAccess metadata element. Mappings can carry refineries (see below).
SKIP Use SKIP to ignore a data source column or table.field.
Constant Set a data source column or table.field to an arbitrary constant value. Include the chosen value in the Source column on the mapping spreadsheet. Matches on CollectiveAccess list item idno.
Rule Performs actions during the import (such as skipping or setting data) based on conditional expressions. Made up of two parts: "Rule triggers" and "Rule actions."
Setting Sets preferences for the mapping (see below).

Settings

Setting Description Parameter notes Example Currently supported?
name Human readable name of the mapping Arbitrary text My mapping Implemented
code Alphanumeric code of the mapping Arbitrary, no special characters or spaces my_mapping Implemented
inputFormats Sets types of source (input) data that can be handled by this import mapping. Values are format codes defined by the various DataReader plugins. You can specify multiple formats by specifying their format codes in a semicolon delimited list. If you omit this setting then the mapping is assumed to be valid for any data type which, of course, is very unlikely. file type XLSX Implemented
table Sets the table for the imported data Corresponds to the CollectiveAccess basic tables ca_objects Implemented
type Type to set all imported records to. If import includes a mapping to type_id, that will be privileged and the type setting will be ignored. CollectiveAccess list item idno posters Implemented
numInitialRowsToSkip The number of rows at the top of the data set to skip. Use this setting to skip over column headers in spreadsheets and similar data. numeric value 2 Implemented
existingRecordPolicy Determines how existing records in the CollectiveAccess system are checked for and handled for the mapping. Also determines how records created by the mapping are merged with other instances (idno and/or preferred label) in the data source. none; skip_on_idno; merge_on_idno; overwrite_on_idno; skip_on_preferred_labels; merge_on_preferred_labels; overwrite_on_preferred_labels; skip_on_idno_and_preferred_labels; merge_on_idno_and_preferred_labels; overwrite_on_idno_and_preferred_labels; merge_on_idno_with_replace; merge_on_preferred_labels_with_replace; merge_on_idno_and_preferred_labels_with_replace none Implemented
errorPolicy Determines how errors are handled for the import. "Stop" will halt the entire import on any error. ignore; stop ignore Implemented
archiveMapping Set to yes to save the mapping spreadsheet; no to delete it from the server after import yes; no yes Pending
archiveDataSets Set to yes to save the data spreadsheet or no to delete it from the server after import yes; no yes Pending
basePath For XML input formats only, basePath contains an XPath expression used to supply a set of XML nodes that are to be treated, for purposes of import, as individual records. Many XML formats require the basePath setting for data import to be possible. XML tags /text/body/div Implemented

Source

The Source column is used to set precisely which element from the data source is to be mapped or skipped. You can also set a constant data value, rather than a mapping, by setting the rule type to "Constant" and the Source column as the value or list item idno from your CollectiveAccess configuration.

Spreadsheets
If you wish to map from Column B of an Excel spreadsheet, you would list the Source as 2. (A = 1, B = 2, C = 3, and so on.) For this mapping, Column B of your source data would be pulled. If on the other hand, you wish to skip this column, you would set the Rule Type to Skip and the source value to 2.

XML
Set the Source column to the name of the XML tag, proceeded with a forward slash (i.e. /Sponsoring_Department or /inm:ContactName)

Source syntax for XML input formats

XPath, a query language for selecting nodes and computing values from an XML document, is supported for "Source" specification when importing XML. W3C offers a basic tutorial for writing XPath expressions. The most common XPath expressions you will need to import XML data can be found in the examples in the following table.

Imports Description XML example Mapping source syntax Notes
Items Imports element items
<text>
    <body>
        <div>
            ITEM
        </div>
    </body>
</text>
/text/body/div Example imports "ITEM".
Items with a particular attribute value Imports items only from elements with a certain attribute node value.
<text>
    <body>
        <div attribute="thistype!">
            ITEM
        </div>
    </body>
</text>
/text/body/div[@attribute='thistype!'] Maps element items with "thistype!" as attribute value.
Attribute value Imports the attribute node value itself.
<text>
    <body
        <div attribute="thistype!">
            ITEM
        </div>
    </body>
</text>
/text/body/div/@attribute Using the example, this mapping would import 'thistype!' itself, as opposed to "ITEM".
Items not of a particular attribute value Imports items in cases where the element does not have an attribute, or in cases where the attribute value is empty.
<text>
    <body>
        <div>
            ITEM_1
        </div>
        <div attribute="thistype!">
            ITEM_2
        </div>
    </body>
</text>"
/text/body/div[not(@attribute)] This only necessary in cases where there are other instances where the same element does have the attribute, like the example here. In this case, ITEM_1 would be imported, and ITEM_2 would not.

MARC
The Source value for MARC supports fields and indicators. i.e. 100/a (field=100; no indicators) 100/a/x (indicator 1=x) 100/a/xy (indicator 1=x; indicator 2=y). For subfields, format the import text with the option formatWithTemplate, i.e. {"formatWithTemplate": "^245/a ^245/b ^245/f ^245/g ^245/h ^245/k ^245/n ^245/p ^245/s"}

CollectiveAccess
Set the Source column to the ca_table.element_code of the originating data.

FMPXML/RESULT
FileMakerPro XMLRESULT. A few things to note here due to inclusion of invalid characters in field names in certain databases (i.e. ArtBase). Source field names in the mapping must follow these rules:

  • Field name should be preceeded with a forward slash (i.e. /Inventory::ArtistLast)
  • The importer does not trim trailing spaces in field names so watch out for that!
  • Only A-Z a-z 0-9 and these special characters are accepted _ - & # ? % :
  • For all other special characters, including a space, replace the character with a single _ (underscore).
  • If two invalid special characters fall in a row, use only a single _ (underscore) rather than two

CA table.element

You declared the data source in the previous column. Now, it's time to set the destination. Using the CA table.element column, set where you wish the Source data to be mapped to in CollectiveAccess. If you are setting the Source to Skip, of course, you do not need to complete this step. If you are mapping data or applying a constant value, you do need to set the destination. This is accomplished by writing the ca_table.element_code.

CA Table corresponds to the CollectiveAccess basic tables, while element_code is simply the unique code you assigned to a particular metadata element in your CA configuration.

For example, to map a Title column from your source data into CollectiveAccess, you may wish to set the CA table.element as:

ca_objects.preferred_labels

This would map the data from your Source declaration to the Title field in an Object record in CollectiveAccess.

Group

In many cases, distinct lines in a data set will map into their corresponding metadata elements that happen to be bundled together inside of a single container. For example, a common container is Date, wherein there are actually two metadata elements - one for the date itself, and the other a drop-down menu to declare the date's type (Date Created, Date Accessioned, etc.)

Let's say in your source data there is one column that contains date values, while the next column over contains the date types.

If the corresponding metadata elements in CA are bundled into a container, you must tell this to the mapping document by placing these Source elements into a group. Otherwise, the date value would be mapped to one container, while the date type would be mapped to another container (and each would be missing their counterpart!)

Declaring a Group is simple. Just assign a name to each line that is to be mapped into a single container.

If Source "2" is mapping to ca_objects_date.date_value, and Source "3" is mapping to ca_objects_date.date_type, then simply give each line the group name "Date." This will tell the mapping that these two lines are going to a single container - and won't create a whole new container for each.

Refineries

At present all of the existing refineries fall into one of five camps: Splitters, Makers, Joiners, Getters and Builders. Each framework is designed to take a specific data format and transform it via a specific behavior as it is imported into CollectiveAccess.

Splitters
Splitter refineries can either create records, match data to existing records (following a mapping's existingRecordPolicy) or break a single string of source data into several metadata elements in CollectiveAccess. Splitters for relationships are used when several parameters are required, such as setting a record type and setting a relationship type. Using the entitySplitter, a name in a single location (i.e. column) in a data source can be parsed (into first, middle, last, prefix, suffix, et al.) within the new record. Similarly the measurementSplitter breaks up, for example, a list of dimensions into to a CollectiveAccess container of sub-elements. "Splitter" also implies that multiple data elements, delimited in a single location, can be "split" into unique records related to the imported record.

Makers
Maker refineries are used to create CollectiveAccess tour/tour stop, object lot/object and list/list item pairings. These relationships are different than other CollectiveAccess relationships for two reasons. Firstly, they don't carry relationship types. Secondly, these relationships are always single to multiple: a tour can have many tour stops, but a tour stop can never belong to more than one tour. Similarly an object can never belong to more than one lot. List items belong to one and only one list. The Maker refinery is used for these specific cases where "relationshipType" and other parameters are unnecessary.

Joiners
In some ways Joiners are the opposite of Splitters. An entityJoiner refinery is used when two or more parts of a name (located in different areas of the data source) need to be conjoined into a single record. The dateJoiner makes a single range out of two or more elements in the data source.

Getters
Getters are designed specifically for MYSQL data mappings. These refineries map the repeating source data through the linking table to the correct CollectiveAccess elements.

Builders
Builders create an upper hierarchy above the to-be-imported data. Note that Splitters also create upper hierarchies with the parent parameter, but they do so above records related to the imported data. For example, let's say you were importing ca_collections and wanted to map a "Series" and "Sub-series" above imported "File" data. You'd use the collectionHierarchyBuilder refinery. However, if you were importing ca_objects and wanted to relate a "File" while building an upper hierarchy of "Series" and "Sub-series" you would use the collectionSplitter and the parent parameter.


entitySplitter

The entitySplitter creates an entity record or finds an exact match (on name) and creates a relationship. Breaks up parts of a name, sets type and other paramaters.

Refinery Refinery parameter Parameter notes Example
entitySplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
entitySplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found {"relationshipType": "^10"} or {"relationshipType": "author"}
entitySplitter entityType Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found {"entityType": "person"}
entitySplitter attributes Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "address": {
           "address1": "^24",
           "address2": "^25",
           "city": "^26",
           "stateprovince": "^27",
           "postalcode": "^28",
           "country": "^29"
       }
   }

}

entitySplitter attributes:idno To map source data to idnos in an entitySplitter, see the 'attributes' parameter above. An exception exists for when idnos are set to be auto-generated. To create auto-generated idnos within an entitySplitter, use the following syntax. "attributes": {"idno":"%"}
entitySplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"creator"}
entitySplitter entityTypeDefault Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types {"entityTypeDefault":"individual"}
entitySplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

entitySplitter relatedEntities This allows you to create and/or relate additional entities to the entity being mapped. For example, if you are running an Object mapping and using an entitySplitter to generate related Individuals, but you also want to create entity records for each individual's affiliation, use this setting. "Name" is the name of the entity, which will be automatically split into pieces and imported. If you want to explicitly map pieces of a name (surname, forename) you can omit "name" and use "forename", "middlename", "surname", etc. "type", "attributes," and "relationshipType" operate just as they would in a regular splitter. {"relatedEntities": [{"type":"ind", "name": "^3", "attributes":{}, "relationshipType":"related"}]}
entitySplitter nonPreferredLabels Maps source data cells to ca_entities.nonpreferred_labels of the entity being generated or matched by the entitySplitter "nonPreferredLabels": [{"forename": "^5", "surname":"^6"}]
entitySplitter parents Builds hierarchical records above the entity being generated or matched by the entitySplitter. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules
{
   "parents": [
       {
           "idno": "^5",
           "name": "^6",
           "type": "organization",
           "attributes": {
               "ca_entities.description": "^7"
           }
       }
   ]

}

entitySplitter skipIfValue Skip if imported value is in the specified list of values. {"skipIfValue": "unknown"}

entityHierarchyBuilder

The entityHierarchyBuilder is used to create an upper hierarchy of occurrences only when the table of the import is ca_entities

entityHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchically above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules.
{

"parents": [

 {
  "idno": "^2", 
  "name": "^3", 
  "type": "org"
 }
]

}

entityJoiner

The entityJoiner merges data from two or more data sources (i.e. two columns on a spreadsheet) to make a single entity record. This refinery should be used when last and first names, for example, are in two different locations.

Refinery Refinery parameter Parameter notes Example
entityJoiner entityType Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found {"entityType": "person"}
entityJoiner entityTypeDefault Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types. {"entityTypeDefault": "person"}
entityJoiner forename Accepts a constant value for the forename or a reference to the location in the data source where the forename can be found. {"forename":"^3"}
entityJoiner surname Accepts a constant value for the surname or a reference to the location in the data source where the surname can be found. {"surname": "^2"}
entityJoiner other_forenames Accepts a constant value for the entity's other forenames or a reference to the location in the data source where the other forenames can be found. {"other_forenames": "^10"}
entityJoiner middlename Accepts a constant value for the middlename or a reference to the location in the data source where the middlename can be found. {"surname": "^12"}
entityJoiner displayname Accepts a constant value for the displayname or a reference to the location in the data source where the displayname can be found. {"displayname": "^14"}
entityJoiner prefix Accepts a constant value for the prefix or a reference to the location in the data source where the prefix can be found. {"prefix": "^14"}
entityJoiner suffix Accepts a constant value for the suffix or a reference to the location in the data source where the suffix can be found. {"suffix": "^14"}
entityJoiner attributes Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "agentDateSet": {
           "agentDate": "^12"
       }
   }

}

entityJoiner nonpreferred_labels List of non-preferred label values or references to locations in the data source where nonpreferred label values can be found. Use the split value for a label to indicate a value that should be split into entity label components before import. {
   "nonpreferred_labels": [
       {
           "forename": "^5",
           "surname": "^6"
       }
   ]

}
OR
{

   "nonpreferred_labels": [
       {
           "split": "^4"
       }
   ]

}

entityJoiner relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found {"relationshipType": "^10"}
entityJoiner relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system. {"relationshipTypeDefault": "author"}
entityJoiner skipIfValue Skip if imported value is in the specified list of values. {"skipIfValue": "unknown"}
entityJoiner relatedEntities This allows you to create and/or relate additional entities to the entity being mapped. For example, if you are running an Object mapping and using an entityJoiner to generate related Individuals, but you also want to create entity records for each individual's affiliation, use this setting. "Name" is the name of the entity, which will be automatically split into pieces and imported. If you want to explicitly map pieces of a name (surname, forename) you can omit "name" and use "forename", "middlename", "surname", etc. "type", "attributes," and "relationshipType" operate just as they would in a regular splitter. {"relatedEntities": [{"type":"ind", "name": "^3", "attributes":{}, "relationshipType":"related"}]}
entityJoiner interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

collectionSplitter

The collectionSplitter creates a collection record or finds an exact match (on name) and creates a flat relationship to the imported record (i.e. object) with the parent parameter building an upper hierarchy above the related collection. To build a hierarchy above a collection when ca_collections are the table of the import use the collectionHierarchyBuilder.

Refinery Refinery parameter Parameter notes Example
collectionSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
collectionSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note (for object data): if the relationship type matches that set as the hierarchy control, the object will be pulled in as a "child" element in the collection hierarchy {"relationshipType": "part_of"}
collectionSplitter collectionType Accepts a constant list item idno from the list collection_types or a reference to the location in the data source where the type can be found {"collectionType": "box"}
collectionSplitter attributes Sets or maps metadata for the collection record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "collectionDateSet": {
           "collectionDate": "^12"
       }
   }

}

collectionSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"part_of"}
collectionSplitter collectionTypeDefault Sets the default collection type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list collection_types {"collectionTypeDefault":"series"}
collectionSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

collectionSplitter nonPreferredLabels List of non-preferred labels to apply to collections. "nonPreferredLabels": "^4"
collectionSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

collectionSplitter skipIfValue Skip if imported value is in the specified list of values. {"skipIfValue": "unknown"}

collectionHierarchyBuilder

The collectionHierarchyBuilder is used to create an upper hierarchy of collections only when the table of the import is ca_collections.

Refinery Refinery parameter Parameter notes Example
collectionHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

placeSplitter

The placeSplitter creates a place record or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
placeSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
placeSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
placeSplitter placeType Accepts a constant list item idno from the list place_types or a reference to the location in the data source where the type can be found {"placeType": "country"}
placeSplitter attributes Sets or maps metadata for the place record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "placeNote": "^12"
   }

}

placeSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"location"}
placeSplitter placeTypeDefault Sets the default place type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list place_types {"placeTypeDefault":"country"}
placeSplitter hierarchy Identifies the list code in the place_hierarchies list for the relevant place hierarchy. {"hierarchy": "dc"}
placeSplitter placeHierarchy Identifier of the place hierarchy to add places under. {"placeHierarchy": "dc"}
placeSplitter nonPreferredLabels List of non-preferred labels to apply to places. "nonPreferredLabels": "^4"
placeSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

placeSplitter parents Place parents to create, if required. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules
{
   "parents": [
       {
           "idno": "^5",
           "name": "^6",
           "type": "country",
           "attributes": {
               "ca_places.description": "^7"
           }
       }
   ]

}

placeHierarchyBuilder

The placeHierarchyBuilder is used to create an upper hierarchy of places only when the table of the import is ca_places.

placeHierarchyBuilder parents Place parents to create, if required. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules
 {
    "parents": [
        {
            "idno": "^5",
            "name": "^6",
            "type": "country",
            "attributes": {
                "ca_places.description": "^7"
            }
        }
    ]
}

movementSplitter

The movementSplitter creates a movement record or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
movementSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
movementSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "related"}
movementSplitter movementType Accepts a constant list item idno from the list movement_types or a reference to the location in the data source where the type can be found. {"movementType": "movement"}
movementSplitter attributes Sets or maps metadata for the place record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "movementNote": "^12"
   }

}

movementSplitter parents movement parents to create, if required.
movementSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"related"}
movementSplitter movementTypeDefault Sets the default movement type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list movement_types. {"movementTypeDefault":"movement"}
movementSplitter nonPreferredLabels List of non-preferred labels to apply to movements. { "nonPreferredLabels": "^4"}
movementSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

objectLotSplitter

The objectLotSplitter creates an object lot record or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
objectLotSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
objectLotSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "part_of"}
objectLotSplitter objectLotStatus Accepts a constant list item idno from the list object_lot_statuses or a reference to the location in the data source where the status can be found. {"objectLotStatus": "pending_accession"}
objectLotSplitter objectLotStatusDefault Sets the default lot status that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list object_lot_statuses. {"objectLotStatusDefault":"accessioned"}
objectLotSplitter objectLotType Accepts a constant list item idno from the list object_lot_types or a reference to the location in the data source where the type can be found. {"objectLotType": "gift"}
objectLotSplitter attributes Sets or maps metadata for the object lot record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "lotNote": "^12"
   }

}

objectLotSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"related"}
objectLotSplitter objectLotTypeDefault Sets the default object lot type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list object_lot_types. {"objectLotTypeDefault":"gift"}
objectLotSplitter nonPreferredLabels List of non-preferred labels to apply to object lots. { "nonPreferredLabels": "^4"}
objectLotSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

objectSplitter

The objectSplitter creates an object record or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
objectSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
objectSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "related"}
objectSplitter objectType Accepts a constant list item idno from the list object_types or a reference to the location in the data source where the type can be found. {"objectType": "moving_image"}
objectSplitter attributes Sets or maps metadata for the object record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "technique": "^10"
   }

}

objectSplitter parents Object parents to create, if required. {
   "parents": [
       {
           "idno": "^2",
           "name": "^3",
           "type": "video",
       },
       {
           "idno": "^5",
           "name": "^6",
           "type": "video",
           
       }
   ]

}

objectSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"related"}
objectSplitter objectTypeDefault Sets the default object type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list object_types. {"objectTypeDefault":"video"}
objectSplitter nonPreferredLabels List of non-preferred labels to apply to objects. { "nonPreferredLabels": "^4"}
objectSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

objectHierarchyBuilder

The objectHierarchyBuilder is used to import a hierarchy of list items when the table of import is ca_objects

objectHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules
 {
    "parents": [
        {
            "idno": "^3",
            "name": "^4",
            "type": "photo",
            "attributes": {
                "ca_objects.description": "^10"
            }
        }
    ]
}

occurrenceSplitter

The occurrenceSplitter creates an occurrence record or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
occurrenceSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
occurrenceSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "related"}
occurrenceSplitter occurrenceType Accepts a constant list item idno from the list occurrence_types or a reference to the location in the data source where the type can be found. {"occurrenceType": "event"}
occurrenceSplitter attributes Sets or maps metadata for the occurrence record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "dates": "^10"
   }

}

occurrenceSplitter parents Occurrence parents to create, if required. {
   "parents": [
       {
           "idno": "^2",
           "name": "^3",
           "type": "event",
       },
       {
           "idno": "^5",
           "name": "^6",
           "type": "series",
           
       }
   ]

}

occurrenceSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"related"}
occurrenceSplitter occurrenceTypeDefault Sets the default occurrence type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list occurrence_types. {"occurrenceTypeDefault":"event"}
occurrenceSplitter nonPreferredLabels List of non-preferred labels to apply to objects. { "nonPreferredLabels": "^4"}
occurrenceSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

occurrenceHierarchyBuilder

The occurrenceHierarchyBuilder is used to create an upper hierarchy of occurrences only when the table of the import is ca_occurrences.

Refinery Refinery parameter Parameter notes Example
occurrenceHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^3",
           "name": "^4",
           "type": "event",
           "attributes": {
               "ca_occurrences.description": "^10"
           }
       },
       {
           "idno": "^5",
           "name": "^6",
           "type": "event_series"
       }
   ]

}

listitemSplitter

The listitemSplitter creates a list item or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
listItemSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
listItemSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
listItemSplitter listItemType Accepts a constant list item idno from the list or a reference to the location in the data source where the type can be found. {"listItemType": "concept"}
listItemSplitter attributes Sets or maps metadata for the list value by referencing the metadataElement code and the location in the data source where the data values can be found. You usually don't set attributes for a list item, but you can here if you need to. {
   "attributes": {
       "listItemNote": "^12"
   }

}

listItemSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^12",
           "name": "^14",
           "type": "concept",
           "attributes": { "ca_list_items.description": "^7"}
       },
       {
           "idno": "^16",
           "name": "^17",
           "type": "guide",
           "rules": [
               {
                   "trigger": "^3= 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_list_items.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

listItemSplitter list Enter the list_code for the list that the item should be added to. This is mandatory - if you forget to set it or set it to a list_code that doesn't exist the mapping will fail.) {"list": "list_code"}
listItemSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"concept"}
listItemSplitter listItemTypeDefault Sets the default list item type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list list_item_types {"listItemTypeDefault":"concept"}
listItemSplitter interstitial Sets or maps metadata for the interstitial vocabulary Relationship record record by referencing the metadataElement code and the location in the data source where the data values can be found. {"interstitial": {
       "description": "^4"
   }

}

listItemHierarchyBuilder

The listItemHierarchyBuilder is used to import a hierarchy of list items when the table of import is ca_list_items

listItemHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules
{
    "parents": [
        {
            "name": "^4"
        }
    ]
}
listItemHierarchyBuilder list Identifies the root node of the list item list to add parent items to. This is normally the idno of the list that the base import relates to.
{
    "list": "object_classification"
}

listItemIndentedHierarchyBuilder

The listItemIndentedHierarchyBuilder builds hierarchical lists from data sources where "indents" or empty cells in Excel are used to indicate hierarchical structure. This refinery can be used either: (a) in "stand alone" mode, where the list is imported without relationships to any extant or newly created records or (b) in context of and as metadata for an import that maps authority records of another type (i.e. objects that carry the list items as metadata). In the case of (b) "indent" formatting would be used within the data source for the authority records.

An example of "indenting" used to indicate hierarchical structure:

Indent.png

Refinery Refinery parameter Parameter notes Example
listItemIndentedHierarchyBuilder levels List of sources for hierarchy levels "levels":["^1", "^2", "^3"]
listItemIndentedHierarchyBuilder levelTypes List of types for hierarchy levels "levelTypes":["concept", "concept", "concept"]
listItemIndentedHierarchyBuilder list Code of list to import items into "list": "categories"
listItemIndentedHierarchyBuilder mode Set to "returnData" to return the id of lowest item in the hierarchy to the importer; set to "processOnly" to create the list items in the hierarchy but not return values to the importer. Default is to process only. "mode": "processOnly"

storageLocationSplitter

The storageLocationSplitter creates a storage location or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
storageLocationSplitter hierarchicalStorageLocationTypes Sets the storage location types used to label each level in a numerically expressed hierarchy {
   "hierarchicalStorageLocationTypes": [
       "room",
       "rack",
       "cabinet"
   ]

}

storageLocationSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter":";"}
storageLocationSplitter hierarchicalDelimiter Specifies the delimiter to on which to break when designating hierarchicalStorageLocationTypes {"hierarchicalDelimiter":"."}
storageLocationSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^10",
           "name": "^12",
           "type": "cabinet",
           "attributes": { "ca_storage_location.description": "^7"}
       },
       {
           "idno": "^14",
           "name": "^16",
           "type": "room",
           "rules": [
               {
                   "trigger": "^18= 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_storage_locations.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

storageLocationSplitter nonPreferredLabels List of non-preferred labels to apply to tour stops. {"nonPreferredLabels":"^5"}
storageLocationSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

storageLocationSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType":"home"}
storageLocationSplitter storageLocationType Accepts a constant list item idno from the list storage_location_types or a reference to the location in the data source where the type can be found. {"storageLocationType":"room"}
storageLocationSplitter attributes Sets or maps metadata for the storage location record by referencing the metadataElement code and the location in the data source where the data values can be found. {"attributes":{"storage_description":"^7", "storage_dates":"8"}}
storageLocationSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system {"relationshipTypeDefault":"located"}
storageLocationSplitter storageLocationTypeDefault Sets the default storage location type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list storage_location_types {"storageLocationTypeDefault":"permanent"}

storageLocationHierarchyBuilder

The storageLocationHierarchyBuilder is used to create an upper hierarchy of storage locations only when the table of the import is ca_storage_locations.

Refinery Refinery parameter Parameter notes Example
storageLocationHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
            "idno": "^2", 
             "name": "^3", 
            "type": "room"
           "attributes": {
               "ca_storage_locations.notes": "^11"
           }
       },
       {
           "idno": "^5",
           "name": "^6",
           "type": "building"
       }
   ]

}

loanSplitter

The loanSplitter creates a loan or finds an exact match (on name) and creates a relationship. NOTE: the loanSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
loanSplitter loanType Accepts a constant list item from the list loan_types {"loanType":"out"}
loanSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note for object data: if the relationship type matches that which is set as the hierarchy control, the object will be pulled in as a "child" element in the loan hierarchy {"relationshipType":"part_of"}
loanSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter":"."}
loanSplitter attributes Sets or maps metadata for the loan record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "loanDate": "^11"
   }

}

loanSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system {"relationshipTypeDefault":"part_of"}
loanSplitter loanTypeDefault Sets the default loan type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list loan_types. {"loanTypeDefault":"in"}
loanSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

loanSplitter parents Loan parents to create, if required. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules
{
   "parents": [
       {
           "idno": "^5",
           "name": "^6",
           "type": "in",
           "attributes": {
               "ca_loans.description": "^7"
           }
       }
   ]

}

measurementsSplitter

The measurementsSplitter formats data values that are mapped to an element of the datatype Length or Weight. NOTE: the measurementsSplitter does not create new records (it only maps data) and as a result the full container paths must not be specified in the attributes/elements parameter (i.e. just use subElement_code or measurementsWidth)

Refinery Refinery parameter Parameter notes Example
measurementsSplitter delimiter Sets the value of the delimiter to break on, separating measurement values {"delimiter": "x"}
measurementsSplitter units set to value of the measurement unit {"units": "in"}
measurementsSplitter elements maps the components of the dimensions to specific metadata elements {
   "elements": [
       {
           "quantityElement": "measurementWidth",
           "typeElement": "measurementsType",
           "type": "width"
       },
       {
           "quantityElement": "measurementHeight",
           "typeElement": "measurementsType2",
           "type": "height"
       }
   ]

}
Note: the typeElement and type sub-components are optional and should only be used in measurement containers that include a type drop-down.

measurementsSplitter attributes maps the other non-measurement elements that may be in the same container. Values here are set for all measurements being split. {
   "attributes": {
       "notes": "^1"
   }

}

tourStopSplitter

The tourStopSplitter creates a tour stop or finds an exact match (on name) and creates a relationship.

Refinery Refinery parameter Parameter notes Example
tourStopSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
tourStopSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
tourStopSplitter tourStopType Accepts a constant list item idno from the list tour_stop_types or a reference to the location in the data source where the type can be found. {"tourStopType": "main_stop"}
tourStopSplitter attributes Sets or maps metadata for the tour stop record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "stopDescription": "^11"
   }

}

tourStopSplitter tour Identifies the tour to add the stop to. {"tour": "tour_code"}
tourStopSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"location"}
tourStopSplitter tourStopTypeDefault Sets the default tour stop type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_stop_types {"tourStopTypeDefault":"main_stop"}
tourStopSplitter nonPreferredLabels List of non-preferred labels to apply to tour stops. { "nonPreferredLabels": "^4"}
tourStopSplitter interstitial Sets or maps metadata for the interstitial movement Relationship record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

tourMaker

The tourMaker creates a tour parent in a tour stop mapping.

Refinery Refinery parameter Parameter notes Example
tourMaker tourType Accepts a constant list item idno from the list tour_types or a reference to the location in the data source where the type can be found. {"tourType": "full_tour"}
tourMaker attributes Sets or maps metadata for the tour record by referencing the metadataElement code or the location in the data source where the data values can be found. {
   "attributes": {
       "tour_code": "^1"
   }

}

tourMaker tourTypeDefault Sets the default tour type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_types {"tourTypeDefault": "full_tour"}

dateJoiner

The dateJoiner merges two separate data sources into one data range in a single field in CollectiveAccess.

Refinery Refinery parameter Parameter notes Example
dateJoiner mode Determines how dateJoiner joins date values together. Two-column range (aka "range") is the default if mode is not specified. Options are: multiColumnDate, multiColumnRange, range {"mode": "multiColumnDate"}
dateJoiner month Maps the month value for the date from the data source. (For Multi-column date) {"month": "^4"}
dateJoiner day Maps the day value for the date from the data source. (For Multi-column date) {"day": "^5"}
dateJoiner year Maps the year value for the date from the data source. (For Multi-column date) {"year": "^6"}
dateJoiner startDay Maps the day value for the start date from the data source. (For Multi-column range) {"startDay": "^4"}
dateJoiner startMonth Maps the month value for the start date from the data source. (For Multi-column range) {"startMonth": "^5"}
dateJoiner startYear Maps the year value for the start date from the data source. (For Multi-column range) {"startYear": "^6"}
dateJoiner endDay Maps the day value for the end date from the data source. (For Multi-column range) {"endDay": "^7"}
dateJoiner endMonth Maps the month value for the end date from the data source. (For Multi-column range) {"endMonth": "^8"}
dateJoiner endYear Maps the year value for the end date from the data source. (For Multi-column range) {"endYear": "^9"}
dateJoiner expression Date expression (For Two-column range) {"expression" : "^dateExpression"}
dateJoiner start Maps the date from the data source that is the beginning of the conjoined date range. (For Two-column range) {"start" : "^dateBegin"}
dateJoiner end Maps the date from the data source that is the end of the conjoined date range. (For Two-column range) {"end": "^dateEnd"}

ATRelatedGetter

The ATRelatedGetter works with the design of the ArchivistToolkit MYSQL database. It maps the repeating source data through the linking table to the correct CollectiveAccess elements.

Refinery Refinery parameter Parameter notes Example
ATRelatedGetter table Archivists Toolkit table. {"table":"ArchDescriptionRepeatingData"}
ATRelatedGetter discriminator Discriminator value. {"discriminator":"note"}
ATRelatedGetter key Archivists Toolkit key field. {"key":"DigitalObjects.digitalObjectId"}
ATRelatedGetter map Map of related table field values to CollectiveAccess element. {
   "map": {
       "descriptionText": "^noteContent",
       "descriptionSource": "repository"
   }

}

Options

Options allow you to set additional formatting and conditionals on data during import. Some Options are designed to actually set parameters on the mapping behavior, such as the skip options. skipGroupIfEmpty, for example, allows you to prevent the import of certain fields, depending on the presence of data in another related field. Other Options simply format data, such as formatWithTemplate, suffix, and convertNewlinesToHTML.

Option Description Parameter notes Example
skipGroupIfEmpty If the data value corresponding to this mapping is empty, skip the mappings for the other data in the group set to a non-zero value {"skipGroupIfEmpty": 1}
skipGroupIfValue If the data value corresponding to this mapping contains the set value, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfValue": ["n/a"]}
skipGroupIfNotValue If the data value corresponding to this mapping does not contain the set value, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfNotValue": ["n/a"]}
skipGroupIfExpression If the expression yields true, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfExpression":"^/inm:FileNo <> \"\""}
skipRowIfEmpty If the data value corresponding to this mapping is empty, do not import the row set to a non-zero value {"skipRowIfEmpty": 1}
skipRowIfValue If the data value corresponding to this row contains the set value, do not import the entire row arbitrary value {"skipRowIfValue": ["n/a"]}
skipRowIfNotValue If the data value corresponding to this row does not contains the set value, do not import the entire row arbitrary value {"skipRowIfNotValue": ["n/a"]}
refineries Select the refinery that preforms the correct function to alter your data source as it maps to CollectiveAccess make a selection from the available refineries dateJoiner
original_values Return-separated list of values from the data source to be replaced. For example photo is used in the data source, but photograph is used in CollectiveAccess. data source values sound recording
replacement_values Return-separated list of CollectiveAccess list item idnos that correspond to the mapped values from the original data source. For example sound recording (entered in the Original values column) maps to audio_digital, which is entered here in the Replacement values column. CollectiveAccess list item idnos audio_digital
default Value to use if data source value is blank. CollectiveAccess list item idnos {"default": "mixed"}
delimiter Delimiter to split repeating values on. delimiter value {"delimiter": ";"}
restrictToTypes Restricts the the mapping to only records of the designated type. For example the Duration field is only applicable to objects of the type moving_image and not photograph. CollectiveAccess list item idnos {
   "restrictToTypes": [
       "photograph",
       "other",
       "mixed",
       "text"
   ]

}

formatWithTemplate Formats a field to include words and data values via a set template. text and data source references {"formatWithTemplate": "Painting #^15 created by ^2"}
suffix Appends a text value to the end of the data value. arbitrary text {"suffix": " tons"}
excludeToTypes Not implemented. This possible option would be the inverse of restrictToTypes. {
   "excludeToTypes": [
       "photograph",
       "text"
   ]

}

maxLength The maximum length, in characters, to allow. Values exceeding this length will be truncated to the maximum. {"maxLength":100}
transformValuesUsingWorksheet Transform values using a worksheet for the mapping. See Transform Values Using Worksheet
{
 "transformValuesUsingWorksheet":"Worksheet Title"
}
errorPolicy Determines how errors are handled for the mapping. "Stop" will halt the entire import on any error for this mapping. ignore; stop ignore
convertNewlinesToHTML Convert newlines to HTML <BR/> tags in imported text. Value should be 0 or 1. Default is 0 – don't convert text. {"convertNewlinesToHTML":"1"}
useAsSingleValue Force a repeating value (Eg. repeating XML tags) to be imported as a single value concatenated with the delimiter. From version 1.5 Default is false. {"useAsSingleValue":"1"}
applyRegularExpressions Allows the user to supply a list of regular expressions and replacement values to apply to the value being imported immediately prior to import. It effectively allows the user to do arbitrary rewrites on source data to fix problems. See below for more details. match: a regular expression applied to source data values; replaceWith: if a match is found, it will be replaced with whatever is contained in "replaceWith".
{
   "applyRegularExpressions": [
       {
           "match": "([0-9]+)\\.([0-9]+)",
           "replaceWith": "\\1:\\2"
       },
   ]
}

Original Values and Replacement Values

In some cases, you may wish for the mapping to find certain values in your source data and replace them with new values. In the Original Value column, state all values that you wish to have replaced. Then, in the Replacement Value column, set their replacements. You can add multiple values to a single cell, so long as the replacement value matched the original value line by line.

This feature can be used to correct common misspellings in the source data, or to simply normalize names and terms, or to conform to element codes for fixed list values in CollectiveAccess.

Transform Values Using Worksheet

Using the Original and Replacement columns is sufficient for transforming a small range of values. But for really large transformation dictionaries, use the option "transformValuesUsingWorksheet" instead.

You can use this option to reference a list of values in a separate worksheet within the Excel mapping document. The formatting of the sheet should place original values in the first column, and replacement values in the second column.

The "transformValuesUsingWorksheet" mapping option takes a worksheet name and will load the first and second columns of that sheet as the original and replacement values respectively. When this option is set, any values in the "original values" and "replacement values" columns of the mapping worksheet are ignored, even if the "transformValuesUsingWorksheet" worksheet is empty or does not exist.

You refer to the sheet by name. So the options syntax in your Excel mapping doc would look like this:

{"transformValuesUsingWorksheet":"Worksheet Title"}

Again, if you need to transform a small number of values, simply use the Original/Replacement columns on the mapping worksheet. If you need to transform a really large list, create an additional worksheet, give it a name, add your original/replacement values in the first two columns, and set the transformValuesUsingWorksheet option.

applyRegularExpressions

This option allows the user to effectively rewrite messy and problematic source data using Perl compatible regular expressions as supported in the PHP programming language.

Here is an example. Let's say you are mapping duration data to a TimeCode element, and the source data syntax is invalid.

Invalid timecode format:
7.30.
Should be transformed to valid timecode format:
7:30

The invalid data can be transformed using the applyRegularExpressions option with the proper regular expressions.

{
   "applyRegularExpressions": [
       {
           "match": “([0-9]+)\\.([0-9]+)",
           "replaceWith": "\\1:\\2"
       },
       {
           "match": "[^0-9:]+",
           "replaceWith": ""
       }
   ]
}

match: a regular expression applied to source data values; replaceWith: if a match is found, it will be replaced with whatever is contained in "replaceWith".

In this example, the first regular expression matches <number>.<number> and replaces it with <number>:<number>. In other words, "7.30." becomes "7:30.". The [0-9]+ string matches sequences of 1 or more numbers. Since they’re in parenthesis they can be “back referenced” into the replaceWith part using the \\1 and \\2 placeholders.

The second regular expression matches any character that is not a number or a colon (the first one having reformatted any period between numbers as a colon) and replaces it with nothing – removing it in other words. This regular expression takes care of the erroneous period at the end of the invalid data."7:30." is transformed into "7:30" - a valid TimeCode input.

Note: The only deviation from the standard regular expressions language are the backslashes. Wherever you would use a single backslash in a regular expression, you need to use two in our mapping because JSON treats backslashes specially and demands that a literal \ be encoded as \\.

Rules

Rules allow you to set record-level conditionals in your mapping with target actions triggered by true or false outcomes. With Rules, you can manipulate the migration of specific data and/or set metadata based on expression statements. For example, let's say you want to skip a record if a certain element in your data source is exactly equal to a specific value. Rules allows you to set a target action, such as "SKIP," when a match is triggered.

Rules rely on a two part operation outlined in the import mapping. The first component is is called "Rule triggers" and it is an expression statement that results in a quantity that is evaluated by the data importer. The second part is the "Rule actions" that are performed based on the outcome of the expression.

Let's walk through an example. For an in depth look at writing "Rule triggers" via Expressions, read more here.

For our example, we are going to skip all records with the phrase "do not use" in the description. To do so we write an expression to match "do not use" in the required field, and then set the action to execute when the expression is true to be "SKIP." For the sake of this example we're importing an Excel spreadsheet and the description is in column 5:

This is how the rule should look in the import mapping:

Rule.png

Set "Rule" as your rule type and add the following to the Rule triggers column:

(^5 =~ /do not use/)

Where ^5 references column 5 and =~ invokes the regular expression operator. In the "actions" column is a simple reference to the action:

SKIP

Note that you can potentially add several actions to a single rule trigger by separating the actions with returns. For now, the only possible action is "SKIP", which skips the entire record rather than importing it. A "SET" action, which causes the setting of a field in the import to a specific value, is planned for development soon.

Running a mapping

To run the import through the graphical user interface, see here.

To run the import from the command line, follow these instructions:

Before you begin it’s a good idea to make an area for your import mappings and data that’s easily accessible without an inconveniently-long file path. For the sake of this example our import material will live in a Providence directory at:

/support/project/mappings and
/support/project/data

You'll also want to back-up your database

mysqldump -u#name -p#password project > ~/project_date.dump

The import is executed through caUtils. To see all utilities ask for help after cd-ing into support

cd /path_to_Providence/support
bin/caUtils help

To get further details about the load-import-mapping utility:

bin/caUtils help load-import-mapping

You’ll see that the only load-import-mapping command is the one that designates the mapping file. To load the mapping:

bin/caUtils load-import-mapping --file=project/mappings/mapping1.xlsx

Next you’ll be using the utility import-data. As you’ll see from

bin/caUtils help import-data

there are several options that allow you to designate the format, data source, log preferences, etc. To run the import:

bin/caUtils import-data --format=XLSX --mapping=mapping1 --source=project/data/Data.xlsx --log=project/log

With the PHP ncurses extension installed a display will provide moving status indicators including import progress and recent errors.

To modify your import and rerun the utility, simply restore your database

mysql -u#name -p#password project < ~/project_date.dump

and start the process again!

Common Problems

This section is intended to collect the most common mapping mistakes (and their solutions).

ca_object_lots - When mapping data the the Lots table, remember that Lots require more than just a label and type. They also have a non-optional "lot_status_id" that is a value taken from the "object lot statuses" list. It must be set to a valid value or the lot will fail to insert, and will throw an error message. So be sure set a constant value to ca_object_lots.lot_status_id. You can take this value from the list "object lot statuses" under Manage --> Lists & Vocabularies. Also, remember that id numbers for ca_object_lots don't map to the normal idno but rather ca_object_lots.idno_stub.

Groups - When mapping to to more than one metadata element that are inside of single Container, don't forget to use the Group column to tie each mapping to the same container. You can do this by assigning an arbitrary "group name" to each mapping.

Open/LibreOffice Spreadsheets - If you are loading data from a spreadsheet created by Open/LibreOffice there is a problem with the date detector and you should save the file as an ODF Spreadsheet (*.ods). You can still use the XLS / XLSX importers with these spreadsheets. The data mapping works fine in XLSX format.

ca_entities.preferred_labels - When running a mapping for which ca_entities is the target table, and the source data for entity names is formatted as "first name last name" (i.e. Jane Doe), intuitively you might assume that the target must simply be set to ca_entities.preferred_labels. However, an additional step is required. If the former were set as-is, the mapping would not parse the forename and surname. Surname then would be blank, and the mapping would get skipped. Therefore, an entitySplitter is actually still required in these instances:

1. Set the target ca_table.element to "ca_entities.preferred_labels"
2. Set the refinery to entitySplitter.
3. Set the refinery parameters to:
{"entityType": "entity_type", "skipIfValue": ["unknown"]}

Geocode Attributes - geocode elements take a text format that looks like this:

 <optional descriptive text> [lat1,long1;lat2,long2;lat3,long3] 

where the semicolon list of lat/long pairs can go on as long as is needed so when you import to these fields you will need to format the data according to this structure.

JSON formatting - If you are getting a lot of errors related to a refinery parameter (or if certain fields simply aren't importing), you may have an improperly formatted JSON. For example, if you notice that none of your attributes are importing through an entitySplitter, you may have simply missed a necessary comma in your parameter. To quickly validate your JSON, go to | jsonlint.com

Connecting to another CollectiveAccess system to import data - The CollectiveAccess input type allows you to connect to another CollectiveAccess system for selective import of data. The input type requires an access URL in a very specific format. The format is http://[username]:[password]@[hostname]/[path_to_CollectiveAccess]/[table name]?q=[search]

The items in brackets should be replaced with values appropriate for the target system. The username and password should be a valid login with web service privileges on the target system. The hostname is the name of the target server (Eg. archive.mydomain.com). "path_to_CollectiveAccess" is the part of the URL used to access CA between the host name and CA-specific directories. If the target CA system is running in the root of the web server then this will be blank. "table name" is one of the basic tables in CollectiveAccess (Eg. ca_objects, ca_entities), and specifies what type of record is to be imported. The "search" is a text search to use when retrieving records for import. Set this to an asterisk ("*") if you want to import all records in the specified table. Putting it all together a CA import URL that imports all objects in the target system might look something like this:

http://administrator:3cd324@archive.mydomain.com/ca_objects?q=*

Namespaces

Variants
Actions
Navigation
Toolbox
User
Personal tools