Data Importer/en

From CollectiveAccess Documentation
Jump to: navigation, search

<languages/> [Available for v1.4]

Overview

With the release of CollectiveAccess v1.4, users will be able to create their own mappings and migrate their source data directly from the command line or the Providence user interface (under "Import > Data") into a configuration of Providence. Running a data import involves seven basic steps, all of which are outlined in detail below.

1 - A user must create a mapping document that will serve as a crosswalk between the source data and the ultimate destination in CollectiveAccess.

2 - Before running an import, it is absolutely crucial to create a backup of the database in its current form by executing a data dump.

3 - When you have taken this pre-caution and your mapping document is complete it is now time to run the import from either the command line or the graphical user interface.

4 - When the migration has ran, check the data out in CollectiveAccess and look for errors or points of inconsistencies.

5 - Revise your mapping accordingly

6 - Load the data dump so that the system returns to its pre-import state (and thereby clears out the data that you are about to re-run.)

7 - Run the import again. That's it!

Supported input formats

Currently: XLSX, XLS, MYSQL, Filemaker XML, Inmagic XML and MARC

Planned imminently: tab delimited, CSV, OAI-PMH/DC,

Creating a mapping

To create a mapping, first download the Excel-based import mapping template available here. (File:Data Import Mapping template.xlsx). Once all of the mappings and settings have been entered into the template it can be loaded directly into CollectiveAccess (see Running a mapping below) and the page on Creating and Running a Mapping.

This mapping template operates under two basic assumption about your data. 1 - that each row in a data set corresponds to a single record, and 2 - that each column corresponds to a single metadata element.

Below we will go through each column of the mapping template. In the first column, you must set the Rule Type for each for each column of data in your data set. Refer to the table below for descriptions of each Rule Type.

Rule types

Rule type Description
Mapping Maps a data source column or table.field to a CollectiveAccess metadata element. Mappings can carry refineries (see below).
SKIP Use SKIP to ignore a data source column or table.field.
Constant Set a data source column or table.field to an arbitrary constant value. Include the chosen value in the Source column on the mapping spreadsheet. Matches on CollectiveAccess list item idno.
Rule Performs actions during the import (such as skipping or setting data) based on conditional expressions. Made up of two parts: "Rule triggers" and "Rule actions."
Setting Sets preferences for the mapping (see below).

Settings

Setting Description Parameter notes Example Currently supported?
name Human readable name of the mapping Arbitrary text My mapping Implemented
code Alphanumeric code of the mapping Arbitrary, no special characters or spaces my_mapping Implemented
inputFormats Sets types of source (input) data that can be handled by this import mapping. Values are format codes defined by the various DataReader plugins. You can specify multiple formats by specifying their format codes in a semicolon delimited list. Currently supported format codes: XLSX, MYSQL. Soon: CSV, TAB, MARC. If you omit this setting then the mapping is assumed to be valid for any data type which, of course, is very unlikely. file type XLSX;CSV;TAB Implemented
table Sets the table for the imported data Corresponds to the CollectiveAccess basic tables ca_objects Implemented
type Type to set all imported records to. If import includes a mapping to type_id, that will be privileged and the type setting will be ignored. CollectiveAccess list item idno posters Implemented
numInitialRowsToSkip The number of rows at the top of the data set to skip. Use this setting to skip over column headers in spreadsheets and similar data. numeric value 2 Implemented
existingRecordPolicy Determines how existing records in the CollectiveAccess system are checked for and handled for the mapping. Also determines how records created by the mapping are merged with other instances (idno and/or preferred label) in the data source. none; skip_on_idno; merge_on_idno; overwrite_on_idno; skip_on_preferred_labels; merge_on_preferred_labels; overwrite_on_preferred_labels; skip_on_idno_and_preferred_labels; merge_on_idno_and_preferred_labels; overwrite_on_idno_and_preferred_labels none Implemented
errorPolicy Determines how errors are handled for the import. "Stop" will halt the entire import on any error. ignore; stop ignore Implemented
archiveMapping Set to yes to save the mapping spreadsheet; no to delete it from the server after import yes; no yes Pending
archiveDataSets Set to yes to save the data spreadsheet or no to delete it from the server after import yes; no yes Pending

Source

The Source column is used to set precisely which element from the data source is to be mapped or skipped. You can also set a constant data value, rather than a mapping, by setting the rule type to "Constant" and the Source column as the value or list item idno from your CollectiveAccess configuration.

Spreadsheets
If you wish to map from Column B of an Excel spreadsheet, you would list the Source as 2. (A = 1, B = 2, C = 3, and so on.) For this mapping, Column B of your source data would be pulled. If on the other hand, you wish to skip this column, you would set the Rule Type to Skip and the source value to 2.

XML
Set the Source column to the name of the XML tag, proceeded with a forward slash (i.e. /Sponsoring_Department or /inm:ContactName)

MARC
The Source value for MARC supports fields and indicators. i.e. 100/a (field=100; no indicators) 100/a/x (indicator 1=x) 100/a/xy (indicator 1=x; indicator 2=y). For subfields, format the import text with the option formatWithTemplate, i.e. {"formatWithTemplate": "^245/a ^245/b ^245/f ^245/g ^245/h ^245/k ^245/n ^245/p ^245/s"}

CA table.element

You declared the data source in the previous column. Now, it's time to set the destination. Using the CA table.element column, set where you wish the Source data to be mapped to in CollectiveAccess. If you are setting the Source to Skip, of course, you do not need to complete this step. If you are mapping data or applying a constant value, you do need to set the destination. This is accomplished by writing the ca_table.element_code.

CA Table corresponds to the CollectiveAccess basic tables, while element_code is simply the unique code you assigned to a particular metadata element in your CA configuration.

For example, to map a Title column from your source data into CollectiveAccess, you may wish to set the CA table.element as:

ca_objects.preferred_labels

This would map the data from your Source declaration to the Title field in an Object record in CollectiveAccess.

Group

In many cases, distinct lines in a data set will map into their corresponding metadata elements that happen to be bundled together inside of a single container. For example, a common container is Date, wherein there are actually two metadata elements - one for the date itself, and the other a drop-down menu to declare the date's type (Date Created, Date Accessioned, etc.)

Let's say in your source data there is one column that contains date values, while the next column over contains the date types.

If the corresponding metadata elements in CA are bundled into a container, you must tell this to the mapping document by placing these Source elements into a group. Otherwise, the date value would be mapped to one container, while the date type would be mapped to another container (and each would be missing their counterpart!)

Declaring a Group is simple. Just assign a name to each line that is to be mapped into a single container.

If Source "2" is mapping to ca_objects_date.date_value, and Source "3" is mapping to ca_objects_date.date_type, then simply give each line the group name "Date." This will tell the mapping that these two lines are going to a single container - and won't create a whole new container for each.

Refineries

At present all of the existing refineries fall into one of four camps: Splitters, Makers, Joiners and Getters. Each framework is designed to take a specific data format and transform it via a specific behavior as it is imported into CollectiveAccess.

Splitters
Splitter refineries can either create records, match data to existing records (following a mapping's existingRecordPolicy) or break a single string of source data into several metadata elements in CollectiveAccess. Splitters for relationships are used when several parameters are required, such as setting a record type and setting a relationship type. Using the entitySplitter, a name in a single location (i.e. column) in a data source can be parsed (into first, middle, last, prefix, suffix, et al.) within the new record. Similarly the measurementSplitter breaks up, for example, a list of dimensions into to a CollectiveAccess container of sub-elements. "Splitter" also implies that multiple data elements, delimited in a single location, can be "split" into unique records related to the imported record.

Makers
Maker refineries are used to create CollectiveAccess tour/tour stop, object lot/object and list/list item pairings. These relationships are different than other CollectiveAccess relationships for two reasons. Firstly, they don't carry relationship types. Secondly, these relationships are always single to multiple: a tour can have many tour stops, but a tour stop can never belong to more than one tour. Similarly an object can never belong to more than one lot. List items belong to one and only one list. The Maker refinery is used for these specific cases where "relationshipType" and other parameters are unnecessary.

Joiners
In some ways Joiners are the opposite of Splitters. An entityJoiner refinery is used when two or more parts of a name (located in different areas of the data source) need to be conjoined into a single record. The dateJoiner makes a single range out of two or more elements in the data source.

Getters
Getters are designed specifically for MYSQL data mappings. These refineries map the repeating source data through the linking table to the correct CollectiveAccess elements.


The entitySplitter creates an entity record or finds an exact match (on name) and creates a relationship. Breaks up parts of a name, sets type and other paramaters. NOTE: because the entitySplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
entitySplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
entitySplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found {"relationshipType": "^10"} or {"relationshipType": "author"}
entitySplitter entityType Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found {"entityType": "person"}
entitySplitter attributes Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "address": {
           "address1": "^24",
           "address2": "^25",
           "city": "^26",
           "stateprovince": "^27",
           "postalcode": "^28",
           "country": "^29"
       }
   }

}

entitySplitter attributes:idno To map source data to idnos in an entitySplitter, see the 'attributes' parameter above. An exception exists for when idnos are set to be auto-generated. To create auto-generated idnos within an entitySplitter, use the following syntax. "attributes": {"idno":"%"}
entitySplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"creator"}
entitySplitter entityTypeDefault Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types {"entityTypeDefault":"individual"}
entitySplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

entitySplitter relatedEntities This allows you to create and/or relate additional entities to the entity being mapped. For example, if you are running an Object mapping and using an entitySplitter to generate related Individuals, but you also want to create entity records for each individual's affiliation, use this setting. "Name" is the name of the entity, which will be automatically split into pieces and imported. If you want to explicitly map pieces of a name (surname, forename) you can omit "name" and use "forename", "middlename", "surname", etc. "type", "attributes," and "relationshipType" operate just as they would in a regular splitter. {"relatedEntities": [{"type":"ind", "name": "^3", "attributes":{}, "relationshipType":"related"}]}
entitySplitter nonPreferredLabels Maps source data cells to ca_entities.nonpreferred_labels of the entity being generated or matched by the entitySplitter "nonPreferredLabels": [{"forename": "^5", "surname":"^6"}]

The collectionSplitter creates a collection record or finds an exact match (on name) and creates a flat relationship. The collectionSplitter can only be used for flat (regular) relationships, not hierarchical relationships. For hierarchical relationships, use the collectionHierarchyBuilder refinery. NOTE: because the collectionSplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
collectionSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
collectionSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note (for object data): if the relationship type matches that set as the hierarchy control, the object will be pulled in as a "child" element in the collection hierarchy {"relationshipType": "part_of"}
collectionSplitter collectionType Accepts a constant list item idno from the list collection_types or a reference to the location in the data source where the type can be found {"collectionType": "box"}
collectionSplitter attributes Sets or maps metadata for the collection record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "collectionDateSet": {
           "collectionDate": "^12"
       }
   }

}

collectionSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"part_of"}
collectionSplitter collectionTypeDefault Sets the default collection type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list collection_types {"collectionTypeDefault":"series"}
collectionSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

collectionSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The collectionHierarchyBuilder is just like the collectionSplitter except for one key difference: instead of creating flat records it creates a hierarchical relationship between the import data and the parent record(s) created by the refinery. Like the collectionSplitter the collectionHierarchyBuilder first looks to make an exact match (on name) and if none are found it creates the necessary record(s). NOTE: because the collectionHierarchyBuilder creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
collectionHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

The placeSplitter creates a place record or finds an exact match (on name) and creates a relationship. NOTE: because the placeSplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
placeSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
placeSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
placeSplitter placeType Accepts a constant list item idno from the list place_types or a reference to the location in the data source where the type can be found {"placeType": "country"}
placeSplitter attributes Sets or maps metadata for the place record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "placeNote": "^12"
   }

}

placeSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"location"}
placeSplitter placeTypeDefault Sets the default place type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list place_types {"placeTypeDefault":"country"}
placeSplitter hierarchy Identifies the list code in the place_hierarchies list for the relevant place hierarchy. {"hierarchy": "dc"}
placeSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The measurementsSplitter formats data values that are mapped to an element of the datatype Length or Weight. NOTE: the measurementsSplitter does not create new records (it only maps data) and as a result the full container paths must not be specified in the attributes/elements parameter (i.e. just use subElement_code or measurementsWidth)

Refinery Refinery parameter Parameter notes Example
measurementsSplitter delimiter Sets the value of the delimiter to break on, separating measurement values {"delimiter": "x"}
measurementsSplitter units set to value of the measurement unit {"units": "in"}
measurementsSplitter elements maps the components of the dimensions to specific metadata elements {
   "elements": [
       {
           "quantityElement": "measurementWidth",
           "typeElement": "measurementsType",
           "type": "width"
       },
       {
           "quantityElement": "measurementHeight",
           "typeElement": "measurementsType2",
           "type": "height"
       }
   ]

}
Note: the typeElement and type sub-components are optional and should only be used in measurement containers that include a type drop-down.

measurementsSplitter attributes maps the other non-measurement elements that may be in the same container. Values here are set for all measurements being split. {
   "attributes": {
       "notes": "^1"
   }

}

The listitemSplitter creates a list item or finds an exact match (on name) and creates a relationship. NOTE: the listitemSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
listItemSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
listItemSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
listItemSplitter listItemType Accepts a constant list item idno from the list or a reference to the location in the data source where the type can be found. {"listItemType": "concept"}
listItemSplitter attributes Sets or maps metadata for the list value by referencing the metadataElement code and the location in the data source where the data values can be found. You usually don't set attributes for a list item, but you can here if you need to. {
   "attributes": {
       "listItemNote": "^12"
   }

}

listItemSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^12",
           "name": "^14",
           "type": "concept",
           "attributes": { "ca_list_items.description": "^7"}
       },
       {
           "idno": "^16",
           "name": "^17",
           "type": "guide",
           "rules": [
               {
                   "trigger": "^3= 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_list_items.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

listItemSplitter list Enter the list_code for the list that the item should be added to. This is mandatory - if you forget to set it or set it to a list_code that doesn't exist the mapping will fail.) {"list": "list_code"}
listItemSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"concept"}
listItemSplitter listItemTypeDefault Sets the default list item type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list list_item_types {"listItemTypeDefault":"concept"}

The tourStopSplitter creates a tour stop or finds an exact match (on name) and creates a relationship. NOTE: the tourStopSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
tourStopSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
tourStopSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
tourStopSplitter tourStopType Accepts a constant list item idno from the list tour_stop_types or a reference to the location in the data source where the type can be found. {"tourStopType": "main_stop"}
tourStopSplitter attributes Sets or maps metadata for the tour stop record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "stopDescription": "^11"
   }

}

tourStopSplitter tour Identifies the tour to add the stop to. {"tour": "tour_code"}
tourStopSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"location"}
tourStopSplitter tourStopTypeDefault Sets the default tour stop type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_stop_types {"tourStopTypeDefault":"main_stop"}

The storageLocationSplitter creates a new entry in the storage locations hierarchical list.

Refinery Refinery parameter Parameter notes Example
storageLocationSplitter hierarchicalStorageLocationTypes Sets the storage location types used to label each level in a numerically expressed hierarchy {
   "hierarchicalStorageLocationTypes": [
       "room",
       "rack",
       "cabinet"
   ]

}

storageLocationSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter":";"}
storageLocationSplitter hierarchicalDelimiter Specifies the delimiter to on which to break when designating hierarchicalStorageLocationTypes {"hierarchicalDelimiter":"."}
storageLocationSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^10",
           "name": "^12",
           "type": "cabinet",
           "attributes": { "ca_storage_location.description": "^7"}
       },
       {
           "idno": "^14",
           "name": "^16",
           "type": "room",
           "rules": [
               {
                   "trigger": "^18= 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_storage_locations.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

storageLocationSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The loanSplitter creates a new loan-in or loan-out record.

Refinery Refinery parameter Parameter notes Example
loanSplitter loanType Accepts a constant list item from the list loan_types {"loanType":"out"}
loanSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note for object data: if the relationship type matches that which is set as the hierarchy control, the object will be pulled in as a "child" element in the loan hierarchy {"relationshipType":"part_of"}
loanSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter":"."}
loanSplitter attributes Sets or maps metadata for the loan record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "loanDate": "^11"
   }

}

loanSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system {"relationshipTypeDefault":"part_of"}
loanSplitter loanTypeDefault Sets the default loan type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list loan_types. {"loanTypeDefault":"in"}
loanSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The tourMaker creates a tour parent in a tour stop mapping.

Refinery Refinery parameter Parameter notes Example
tourMaker tourType Accepts a constant list item idno from the list tour_types or a reference to the location in the data source where the type can be found. {"tourType": "full_tour"}
tourMaker attributes Sets or maps metadata for the tour record by referencing the metadataElement code or the location in the data source where the data values can be found. {
   "attributes": {
       "tour_code": "^1"
   }

}

tourMaker tourTypeDefault Sets the default tour type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_types {"tourTypeDefault": "full_tour"}


The dateJoiner merges two separate data sources into one data range in a single field in CollectiveAccess.

Refinery Refinery parameter Parameter notes Example
dateJoiner mode Determines how dateJoiner joins date values together. Two-column range (aka "range") is the default if mode is not specified. Options are: multiColumnDate, multiColumnRange, range {"mode": "multiColumnDate"}
dateJoiner month Maps the month value for the date from the data source. (For Multi-column date) {"month": "^4"}
dateJoiner day Maps the day value for the date from the data source. (For Multi-column date) {"day": "^5"}
dateJoiner year Maps the year value for the date from the data source. (For Multi-column date) {"year": "^6"}
dateJoiner startDay Maps the day value for the start date from the data source. (For Multi-column range) {"startDay": "^4"}
dateJoiner startMonth Maps the month value for the start date from the data source. (For Multi-column range) {"startMonth": "^5"}
dateJoiner startYear Maps the year value for the start date from the data source. (For Multi-column range) {"startYear": "^6"}
dateJoiner endDay Maps the day value for the end date from the data source. (For Multi-column range) {"endDay": "^7"}
dateJoiner endMonth Maps the month value for the end date from the data source. (For Multi-column range) {"endMonth": "^8"}
dateJoiner endYear Maps the year value for the end date from the data source. (For Multi-column range) {"endYear": "^9"}
dateJoiner expression Date expression (For Two-column range) {"expression" : "^dateExpression"}
dateJoiner start Maps the date from the data source that is the beginning of the conjoined date range. (For Two-column range) {"start" : "^dateBegin"}
dateJoiner end Maps the date from the data source that is the end of the conjoined date range. (For Two-column range) {"end": "^dateEnd"}

The entityJoiner merges data from two or more data sources (i.e. two columns on a spreadsheet) to make a single entity record. This refinery should be used when last and first names, for example, are in two different locations. NOTE: because the entityJoiner creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
entityJoiner entityType Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found {"entityType": "person"}
entityJoiner entityTypeDefault Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types. {"entityTypeDefault": "person"}
entityJoiner forename Accepts a constant value for the forename or a reference to the location in the data source where the forename can be found. {"forename":"^3"}
entityJoiner surname Accepts a constant value for the surname or a reference to the location in the data source where the surname can be found. {"surname": "^2"}
entityJoiner other_forenames Accepts a constant value for the entity's other forenames or a reference to the location in the data source where the other forenames can be found. {"other_forenames": "^10"}
entityJoiner middlename Accepts a constant value for the middlename or a reference to the location in the data source where the middlename can be found. {"surname": "^12"}
entityJoiner displayname Accepts a constant value for the displayname or a reference to the location in the data source where the displayname can be found. {"displayname": "^14"}
entityJoiner prefix Accepts a constant value for the prefix or a reference to the location in the data source where the prefix can be found. {"prefix": "^14"}
entityJoiner suffix Accepts a constant value for the suffix or a reference to the location in the data source where the suffix can be found. {"suffix": "^14"}
entityJoiner attributes Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "agentDateSet": {
           "agentDate": "^12"
       }
   }

}

entityJoiner nonpreferred_labels List of non-preferred label values or references to locations in the data source where nonpreferred label values can be found. Use the split value for a label to indicate a value that should be split into entity label components before import. {
   "nonpreferred_labels": [
       {
           "forename": "^5",
           "surname": "^6"
       }
   ]

}
OR
{

   "nonpreferred_labels": [
       {
           "split": "^4"
       }
   ]

}

entityJoiner relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found {"relationshipType": "^10"}
entityJoiner relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system. {"relationshipTypeDefault": "author"}
entityJoiner skipIfValue Skip if imported value is in the specified list of values. {"skipIfValue": "unknown"}
entityJoiner relatedEntities This allows you to create and/or relate additional entities to the entity being mapped. For example, if you are running an Object mapping and using an entityJoiner to generate related Individuals, but you also want to create entity records for each individual's affiliation, use this setting. "Name" is the name of the entity, which will be automatically split into pieces and imported. If you want to explicitly map pieces of a name (surname, forename) you can omit "name" and use "forename", "middlename", "surname", etc. "type", "attributes," and "relationshipType" operate just as they would in a regular splitter. {"relatedEntities": [{"type":"ind", "name": "^3", "attributes":{}, "relationshipType":"related"}]}
entityJoiner interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}


The ATRelatedGetter works with the design of the ArchivistToolkit MYSQL database. It maps the repeating source data through the linking table to the correct CollectiveAccess elements.

Refinery Refinery parameter Parameter notes Example
ATRelatedGetter table Archivists Toolkit table. {"table":"ArchDescriptionRepeatingData"}
ATRelatedGetter discriminator Discriminator value. {"discriminator":"note"}
ATRelatedGetter key Archivists Toolkit key field. {"key":"DigitalObjects.digitalObjectId"}
ATRelatedGetter map Map of related table field values to CollectiveAccess element. {
   "map": {
       "descriptionText": "^noteContent",
       "descriptionSource": "repository"
   }

}

Options

Options allow you to set additional formatting and conditionals on data during import. Some Options are designed to actually set parameters on the mapping behavior, such as the skip options. skipGroupIfEmpty, for example, allows you to prevent the import of certain fields, depending on the presence of data in another related field. Other Options simply format data, such as formatWithTemplate, suffix, and convertNewlinesToHTML.

Option Description Parameter notes Example
skipGroupIfEmpty If the data value corresponding to this mapping is empty, skip the mappings for the other data in the group set to a non-zero value {"skipGroupIfEmpty": 1}
skipGroupIfValue If the data value corresponding to this mapping contains the set value, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfValue": ["n/a"]}
skipGroupIfNotValue If the data value corresponding to this mapping does not contain the set value, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfNotValue": ["n/a"]}
skipGroupIfExpression If the expression yields true, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfExpression":"^/inm:FileNo <> \"\""}
skipRowIfEmpty If the data value corresponding to this mapping is empty, do not import the row set to a non-zero value {"skipRowIfEmpty": 1}
skipRowIfValue If the data value corresponding to this row contains the set value, do not import the entire row arbitrary value {"skipRowIfValue": ["n/a"]}
skipRowIfNotValue If the data value corresponding to this row does not contains the set value, do not import the entire row arbitrary value {"skipRowIfNotValue": ["n/a"]}
refineries Select the refinery that preforms the correct function to alter your data source as it maps to CollectiveAccess make a selection from the available refineries dateJoiner
original_values Return-separated list of values from the data source to be replaced. For example photo is used in the data source, but photograph is used in CollectiveAccess. data source values sound recording
replacement_values Return-separated list of CollectiveAccess list item idnos that correspond to the mapped values from the original data source. For example sound recording (entered in the Original values column) maps to audio_digital, which is entered here in the Replacement values column. CollectiveAccess list item idnos audio_digital
default Value to use if data source value is blank. CollectiveAccess list item idnos {"default": "mixed"}
delimiter Delimiter to split repeating values on. delimiter value {"delimiter": ";"}
restrictToTypes Restricts the the mapping to only records of the designated type. For example the Duration field is only applicable to objects of the type moving_image and not photograph. CollectiveAccess list item idnos {
   "restrictToTypes": [
       "photograph",
       "other",
       "mixed",
       "text"
   ]

}

formatWithTemplate Formats a field to include words and data values via a set template. text and data source references {"formatWithTemplate": "Painting #^15 created by ^2"}
suffix Appends a text value to the end of the data value. arbitrary text {"suffix": " tons"}
excludeToTypes Not implemented. This possible option would be the inverse of restrictToTypes. {
   "excludeToTypes": [
       "photograph",
       "text"
   ]

}

maxLength The maximum length, in characters, to allow. Values exceeding this length will be truncated to the maximum. {"maxLength":100}
errorPolicy Determines how errors are handled for the mapping. "Stop" will halt the entire import on any error for this mapping. ignore; stop ignore
convertNewlinesToHTML Convert newlines to HTML <BR/> tags in imported text. Value should be 0 or 1. Default is 0 – don't convert text. {"convertNewlinesToHTML":"1"}

Original Values and Replacement Values

In some cases, you may wish for the mapping to find certain values in your source data and replace them with new values. In the Original Value column, state all values that you wish to have replaced. Then, in the Replacement Value column, set their replacements. You can add multiple values to a single cell, so long as the replacement value matched the original value line by line.

This feature can be used to correct common misspellings in the source data, or to simply normalize names and terms, or to conform to element codes for fixed list values in CollectiveAccess.

Transform Values Using Worksheet

Using the Original and Replacement columns is sufficient for transforming a small range of values. But for really large transformation dictionaries, use the option "transformValuesUsingWorksheet" instead.

You can use this option to reference a list of values in a separate worksheet within the Excel mapping document. The formatting of the sheet should place original values in the first column, and replacement values in the second column.

The "transformValuesUsingWorksheet" mapping option takes a worksheet name and will load the first and second columns of that sheet as the original and replacement values respectively. When this option is set, any values in the "original values" and "replacement values" columns of the mapping worksheet are ignored, even if the "transformValuesUsingWorksheet" worksheet is empty or does not exist.

You refer to the sheet by name. So the options syntax in your Excel mapping doc would look like this:

{"transformValuesUsingWorksheet":"Worksheet Title"}

Again, if you need to transform a small number of values, simply use the Original/Replacement columns on the mapping worksheet. If you need to transform a really large list, create an additional worksheet, give it a name, add your original/replacement values in the first two columns, and set the transformValuesUsingWorksheet option.

Rules

Rules allow you to set record-level conditionals in your mapping with target actions triggered by true or false outcomes. With Rules, you can manipulate the migration of specific data and/or set metadata based on expression statements. For example, let's say you want to skip a record if a certain element in your data source is exactly equal to a specific value. Rules allows you to set a target action, such as "SKIP," when a match is triggered.

Rules rely on a two part operation outlined in the import mapping. The first component is is called "Rule triggers" and it is an expression statement that results in a quantity that is evaluated by the data importer. The second part is the "Rule actions" that are performed based on the outcome of the expression.

Let's walk through an example. For an in depth look at writing "Rule triggers" via Expressions, read more here.

For our example, we are going to skip all records with the phrase "do not use" in the description. To do so we write an expression to match "do not use" in the required field, and then set the action to execute when the expression is true to be "SKIP." For the sake of this example we're importing an Excel spreadsheet and the description is in column 5:

This is how the rule should look in the import mapping:

Rule.png

Set "Rule" as your rule type and add the following to the Rule triggers column:

(^5 =~ /do not use/)

Where ^5 references column 5 and =~ invokes the regular expression operator. In the "actions" column is a simple reference to the action:

SKIP

Note that you can potentially add several actions to a single rule trigger by separating the actions with returns. For now, the only possible action is "SKIP", which skips the entire record rather than importing it. A "SET" action, which causes the setting of a field in the import to a specific value, is planned for development soon.

Running a mapping

To run the import from the command line, follow these instructions.

Before you begin it’s a good idea to make an area for your import mappings and data that’s easily accessible without an inconveniently-long file path. For the sake of this example our import material will live in a Providence directory at:

/support/project/mappings and
/support/project/data

You'll also want to back-up your database

mysqldump -u#name -p#password project > ~/project_date.dump

The import is executed through caUtils. To see all utilities ask for help after cd-ing into support

cd /path_to_Providence/support
bin/caUtils help

To get further details about the load-import-mapping utility:

bin/caUtils help load-import-mapping

You’ll see that the only load-import-mapping command is the one that designates the mapping file. To load the mapping:

bin/caUtils load-import-mapping --file=project/mappings/mapping1.xlsx

Next you’ll be using the utility import-data. As you’ll see from

bin/caUtils help import-data

there are several options that allow you to designate the format, data source, log preferences, etc. To run the import:

bin/caUtils import-data --format=XLSX --mapping=mapping1 --source=project/data/Data.xlsx --log=project/log

With the PHP ncurses extension installed a display will provide moving status indicators including import progress and recent errors.

To modify your import and rerun the utility, simply restore your database

mysql -u#name -p#password project < ~/project_date.dump

and start the process again!

Common Problems

This section is intended to collect the most common mapping mistakes (and their solutions).

ca_object_lots - When mapping data the the Lots table, remember that Lots require more than just a label and type. They also have a non-optional "lot_status_id" that is a value taken from the "object lot statuses" list. It must be set to a valid value or the lot will fail to insert, and will throw an error message. So be sure set a constant value to ca_object_lots.lot_status_id. You can take this value from the list "object lot statuses" under Manage --> Lists & Vocabularies. Also, remember that id numbers for ca_object_lots don't map to the normal idno but rather ca_object_lots.idno_stub.

Groups - When mapping to to more than one metadata element that are inside of single Container, don't forget to use the Group column to tie each mapping to the same container. You can do this by assigning an arbitrary "group name" to each mapping.

Open/LibreOffice Spreadsheets - If you are loading data from a spreadsheet created by Open/LibreOffice there is a problem with the date detector and you should save the file as an ODF Spreadsheet (*.ods). You can still use the XLS / XLSX importers with these spreadsheets. The data mapping works fine in XLSX format.

ca_entities.preferred_labels - When running a mapping for which ca_entities is the target table, and the source data for entity names is formatted as "first name last name" (i.e. Jane Doe), intuitively you might assume that the target must simply be set to ca_entities.preferred_labels. However, an additional step is required. If the former were set as-is, the mapping would not parse the forename and surname. Surname then would be blank, and the mapping would get skipped. Therefore, an entitySplitter is actually still required in these instances:

1. Set the target ca_table.element to "ca_entities.preferred_labels"
2. Set the refinery to entitySplitter.
3. Set the refinery parameters to:
{"entityType": "entity_type", "skipIfValue": ["unknown"]}

i_sphinx

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools