Data Importer/en
<languages/> [Available for v1.4]
Contents
Overview
With the release of CollectiveAccess v1.4, users will be able to create their own mappings and migrate their source data directly from the command line or the Providence user interface (under "Import > Data") into a configuration of Providence. Running a data import involves seven basic steps, all of which are outlined in detail below.
1 - A user must create a mapping document that will serve as a crosswalk between the source data and the ultimate destination in CollectiveAccess.
2 - Before running an import, it is absolutely crucial to create a backup of the database in its current form by executing a data dump.
3 - When you have taken this pre-caution and your mapping document is complete it is now time to run the import from either the command line or the graphical user interface.
4 - When the migration has ran, check the data out in CollectiveAccess and look for errors or points of inconsistencies.
5 - Revise your mapping accordingly
6 - Load the data dump so that the system returns to its pre-import state (and thereby clears out the data that you are about to re-run.)
7 - Run the import again. That's it!
Supported input formats
Currently: XLSX, XLS, MYSQL, Filemaker XML and Inmagic XML
Planned imminently: tab delimited, CSV, MARC, OAI-PMH/DC,
Creating a mapping
To create a mapping, first download the Excel-based import mapping template available here. (File:Data Import Mapping template.xlsx). Once all of the mappings and settings have been entered into the template it can be loaded directly into CollectiveAccess (see Running a mapping below) and the page on Creating and Running a Mapping.
This mapping template operates under two basic assumption about your data. 1 - that each row in a data set corresponds to a single record, and 2 - that each column corresponds to a single metadata element.
Below we will go through each column of the mapping template. In the first column, you must set the Rule Type for each for each column of data in your data set. Refer to the table below for descriptions of each Rule Type.
Rule types
Rule type | Description |
Mapping | Maps a data source column or table.field to a CollectiveAccess metadata element. Mappings can carry refineries (see below). |
SKIP | Use SKIP to ignore a data source column or table.field. |
Constant | Set a data source column or table.field to an arbitrary constant value. Include the chosen value in the Source column on the mapping spreadsheet. Matches on CollectiveAccess list item idno. |
Setting | Sets preferences for the mapping (see below). |
Settings
Setting | Description | Parameter notes | Example | Currently supported? |
name | Human readable name of the mapping | Arbitrary text | My mapping | Implemented |
code | Alphanumeric code of the mapping | Arbitrary, no special characters or spaces | my_mapping | Implemented |
inputTypes | Sets types of source (input) data that can be handled by this import mapping. Values are format codes defined by the various DataReader plugins. You can specify multiple formats by specifying their format codes in a semicolon delimited list. Currently supported format codes: XLSX, MYSQL. Soon: CSV, TAB, MARC. If you omit this setting then the mapping is assumed to be valid for any data type which, of course, is very unlikely. | file type | XLSX;CSV;TAB | Implemented |
table | Sets the table for the imported data | Corresponds to the CollectiveAccess basic tables | ca_objects | Implemented |
type | Type to set all imported records to. If import includes a mapping to type_id, that will be privileged and the type setting will be ignored. | CollectiveAccess list item idno | posters | Implemented |
numInitialRowsToSkip | The number of rows at the top of the data set to skip. Use this setting to skip over column headers in spreadsheets and similar data. | numeric value | 2 | Implemented |
existingRecordPolicy | Determines how existing records in the CollectiveAccess system are checked for and handled for the mapping. Also determines how records created by the mapping are merged with other instances (idno and/or preferred label) in the data source. | none; skip_on_idno; merge_on_idno; overwrite_on_idno; skip_on_preferred_labels; merge_on_preferred_labels; overwrite_on_preferred_labels; skip_on_idno_and_preferred_labels; merge_on_idno_and_preferred_labels; overwrite_on_idno_and_preferred_labels | none | Implemented |
errorPolicy | Determines how errors are handled for the import. "Stop" will halt the entire import on any error. | ignore; stop | ignore | Implemented |
archiveMapping | Set to yes to save the mapping spreadsheet; no to delete it from the server after import | yes; no | yes | Pending |
archiveDataSets | Set to yes to save the data spreadsheet or no to delete it from the server after import | yes; no | yes | Pending |
Source
The Source column is used to set precisely which element from the data source is to be mapped or skipped. You can also set a constant data value, rather than a mapping, by setting the rule type to "Constant" and the Source column as the value or list item idno from your CollectiveAccess configuration.
Spreadsheets
If you wish to map from Column B of an Excel spreadsheet, you would list the Source as 2. (A = 1, B = 2, C = 3, and so on.) For this mapping, Column B of your source data would be pulled. If on the other hand, you wish to skip this column, you would set the Rule Type to Skip and the source value to 2.
XML
Set the Source column to the name of the XML tag, proceeded with a forward slash (i.e. /Sponsoring_Department or /inm:ContactName)
MARC
The Source value for MARC supports fields and indicators.
i.e.
100/a (field=100; no indicators)
100/a/x (indicator 1=x)
100/a/xy (indicator 1=x; indicator 2=y). For subfields, format the import text with the option formatWithTemplate, i.e. {"formatWithTemplate": "^245/a ^245/b ^245/f ^245/g ^245/h ^245/k ^245/n ^245/p ^245/s"}
CA table.element
You declared the data source in the previous column. Now, it's time to set the destination. Using the CA table.element column, set where you wish the Source data to be mapped to in CollectiveAccess. If you are setting the Source to Skip, of course, you do not need to complete this step. If you are mapping data or applying a constant value, you do need to set the destination. This is accomplished by writing the ca_table.element_code.
CA Table corresponds to the CollectiveAccess basic tables, while element_code is simply the unique code you assigned to a particular metadata element in your CA configuration.
For example, to map a Title column from your source data into CollectiveAccess, you may wish to set the CA table.element as:
ca_objects.preferred_labels
This would map the data from your Source declaration to the Title field in an Object record in CollectiveAccess.
Group
In many cases, distinct lines in a data set will map into their corresponding metadata elements that happen to be bundled together inside of a single container. For example, a common container is Date, wherein there are actually two metadata elements - one for the date itself, and the other a drop-down menu to declare the date's type (Date Created, Date Accessioned, etc.)
Let's say in your source data there is one column that contains date values, while the next column over contains the date types.
If the corresponding metadata elements in CA are bundled into a container, you must tell this to the mapping document by placing these Source elements into a group. Otherwise, the date value would be mapped to one container, while the date type would be mapped to another container (and each would be missing their counterpart!)
Declaring a Group is simple. Just assign a name to each line that is to be mapped into a single container.
If Source "2" is mapping to ca_objects_date.date_value, and Source "3" is mapping to ca_objects_date.date_type, then simply give each line the group name "Date." This will tell the mapping that these two lines are going to a single container - and won't create a whole new container for each.
Refineries
At present all of the existing refineries fall into one of four camps: Splitters, Makers, Joiners and Getters. Each framework is designed to take a specific data format and transform it via a specific behavior as it is imported into CollectiveAccess.
Splitters
Splitter refineries can either create records, match data to existing records (following a mapping's existingRecordPolicy) or break a single string of source data into several metadata elements in CollectiveAccess. Splitters for relationships are used when several parameters are required, such as setting a record type and setting a relationship type. Using the entitySplitter, a name in a single location (i.e. column) in a data source can be parsed (into first, middle, last, prefix, suffix, et al.) within the new record. Similarly the measurementSplitter breaks up, for example, a list of dimensions into to a CollectiveAccess container of sub-elements. "Splitter" also implies that multiple data elements, delimited in a single location, can be "split" into unique records related to the imported record.
Makers
Maker refineries are used to create CollectiveAccess tour/tour stop, object lot/object and list/list item pairings. These relationships are different than other CollectiveAccess relationships for two reasons. Firstly, they don't carry relationship types. Secondly, these relationships are always single to multiple: a tour can have many tour stops, but a tour stop can never belong to more than one tour. Similarly an object can never belong to more than one lot. List items belong to one and only one list. The Maker refinery is used for these specific cases where "relationshipType" and other parameters are unnecessary.
Joiners
In some ways Joiners are the opposite of Splitters. An entityJoiner refinery is used when two or more parts of a name (located in different areas of the data source) need to be conjoined into a single record. The dateJoiner makes a single range out of two or more elements in the data source.
Getters
Getters are designed specifically for MYSQL data mappings. These refineries map the repeating source data through the linking table to the correct CollectiveAccess elements.
The entitySplitter creates an entity record or finds an exact match (on name) and creates a relationship. Breaks up parts of a name, sets type and other paramaters. NOTE: because the entitySplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)
Refinery | Refinery parameter | Parameter notes | Example |
entitySplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter": ";" |
entitySplitter | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found | "relationshipType": "^10" or "relationshipType": "author" |
entitySplitter | entityType | Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found | "entityType": "person" |
entitySplitter | attributes | Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found | "attributes": {"agentDateSet": { "agentDate": "^12"}} |
entitySplitter | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system | "relationshipTypeDefault":"creator" |
entitySplitter | entityTypeDefault | Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types | "entityTypeDefault":"individual" |
entitySplitter | interstitial | Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) | "interstitial": {"relationshipDate": "^4"} |
The collectionSplitter creates a collection record or finds an exact match (on name) and creates a relationship. NOTE: because the collectionSplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)
Refinery | Refinery parameter | Parameter notes | Example |
collectionSplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter": ";" |
collectionSplitter | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note (for object data): if the relationship type matches that set as the hierarchy control, the object will be pulled in as a "child" element in the collection hierarchy | "relationshipType": "part_of" |
collectionSplitter | collectionType | Accepts a constant list item idno from the list collection_types or a reference to the location in the data source where the type can be found | "collectionType": "box" |
collectionSplitter | attributes | Sets or maps metadata for the collection record by referencing the metadataElement code and the location in the data source where the data values can be found | "attributes": {"collectionDateSet": { "collectionDate": "^12"}} |
collectionSplitter | parents | Collection parents to create, if required | |
collectionSplitter | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system | "relationshipTypeDefault":"part_of" |
collectionSplitter | collectionTypeDefault | Sets the default collection type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list collection_types | "collectionTypeDefault":"series" |
The placeSplitter creates a place record or finds an exact match (on name) and creates a relationship. NOTE: because the placeSplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)
Refinery | Refinery parameter | Parameter notes | Example |
placeSplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter": ";" |
placeSplitter | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. | "relationshipType": "location" |
placeSplitter | placeType | Accepts a constant list item idno from the list place_types or a reference to the location in the data source where the type can be found | "placeType": "country" |
placeSplitter | attributes | Sets or maps metadata for the place record by referencing the metadataElement code and the location in the data source where the data values can be found | "attributes": {"placeNote": "^12"} |
placeSplitter | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system | "relationshipTypeDefault":"location" |
placeSplitter | placeTypeDefault | Sets the default place type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list place_types | "placeTypeDefault":"country" |
placeSplitter | hierarchy | Identifies the list code in the place_hierarchies list for the relevant place hierarchy. | "hierarchy": "dc" |
The measurementsSplitter formats data values that are mapped to an element of the datatype Length or Weight. NOTE: the measurementsSplitter does not create new records (it only maps data) and as a result the full container paths must not be specified in the attributes/elements parameter (i.e. just use subElement_code or measurementsWidth)
Refinery | Refinery parameter | Parameter notes | Example |
measurementsSplitter | delimiter | Sets the value of the delimiter to break on, separating measurement values | "delimiter": "x" |
measurementsSplitter | units | set to value of the measurement unit | "units": "in" |
measurementsSplitter | elements | maps the components of the dimensions to specific metadata elements | "elements": [{"quantityElement":"measurementWidth", "typeElement": "measurementsType", "type":"width"},{"quantityElement":"measurementHeight", "typeElement": "measurementsType2","type":"height"}] OR "elements": [{"quantityElement":"measurementsWidth"},{"quantityElement":"measurementsHeight"}] |
measurementsSplitter | attributes | maps the other non-measurement elements that may be in the same container. Values here are set for all measurements being split. | "attributes": {"notes":"^1"} |
The listitemSplitter creates a list item or finds an exact match (on name) and creates a relationship. NOTE: the listitemSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)
Refinery | Refinery parameter | Parameter notes | Example |
listItemSplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter": ";" |
listItemSplitter | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. | "relationshipType": "location" |
listItemSplitter | listItemType | Accepts a constant list item idno from the list or a reference to the location in the data source where the type can be found. | "listItemType": "concept" |
listItemSplitter | attributes | Sets or maps metadata for the list value by referencing the metadataElement code and the location in the data source where the data values can be found. You usually don't set attributes for a list item, but you can here if you need to. | "attributes": {"listItemNote": "^12"} |
listItemSplitter | list | Enter the list_code for the list that the item should be added to. This is mandatory - if you forget to set it or set it to a list_code that doesn't exist the mapping will fail.) | "list": "list_code" |
listItemSplitter | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system | "relationshipTypeDefault":"concept" |
listItemSplitter | listItemTypeDefault | Sets the default list item type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list list_item_types | "listItemTypeDefault":"concept" |
The tourStopSplitter creates a tour stop or finds an exact match (on name) and creates a relationship. NOTE: the tourStopSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)
Refinery | Refinery parameter | Parameter notes | Example |
tourStopSplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter": ";" |
tourStopSplitter | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. | "relationshipType": "location" |
tourStopSplitter | tourStopType | Accepts a constant list item idno from the list tour_stop_types or a reference to the location in the data source where the type can be found. | "tourStopType": "main_stop" |
tourStopSplitter | attributes | Sets or maps metadata for the tour stop record by referencing the metadataElement code and the location in the data source where the data values can be found. | "attributes": {"stopDescription": "^11"} |
tourStopSplitter | tour | Identifies the tour to add the stop to. | "tour": "tour_code" |
tourStopSplitter | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system | "relationshipTypeDefault":"location" |
tourStopSplitter | tourStopTypeDefault | Sets the default tour stop type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_stop_types | "tourStopTypeDefault":"main_stop" |
The storageLocationSplitter creates a new entry in the storage locations hierarchical list.
Refinery | Refinery parameter | Parameter notes | Example |
storageLocationSplitter | hierarchicalStorageLocationTypes | Sets the storage location types used to label each level in a numerically expressed hierarchy | "hierarchicalStorageLocationTypes": ["room", "rack", "cabinet"] |
storageLocationSplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter":";" |
storageLocationSplitter | hierarchicalDelimiter | Specifies the delimiter to on which to break when designating hierarchicalStorageLocationTypes | "hierarchicalDelimiter":"." |
The loanSplitter creates a new loan-in or loan-out record.
Refinery | Refinery parameter | Parameter notes | Example |
loanSplitter | loanType | Accepts a constant list item from the list loan_types | "loanType":"out" |
loanSplitter | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note for object data: if the relationship type matches that which is set as the hierarchy control, the object will be pulled in as a "child" element in the loan hierarchy | "relationshipType":"part_of" |
loanSplitter | delimiter | Sets the value of the delimiter to break on, separating data source values | "delimiter":"." |
loanSplitter | attributes | Sets or maps metadata for the loan record by referencing the metadataElement code and the location in the data source where the data values can be found. | "attributes":"dates"{"date_value":"^4", "date_type":"end"} |
loanSplitter | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system | "relationshipTypeDefault":"part_of" |
loanSplitter | loanTypeDefault | Sets the default loan type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list loan_types. | "loanTypeDefault":"in" |
The tourMaker creates a tour parent in a tour stop mapping.
Refinery | Refinery parameter | Parameter notes | Example |
tourMaker | tourType | Accepts a constant list item idno from the list tour_types or a reference to the location in the data source where the type can be found. | "tourType": "full_tour" |
tourMaker | attributes | Sets or maps metadata for the tour record by referencing the metadataElement code or the location in the data source where the data values can be found. | "attributes": {"tour_code": "^1"} |
tourMaker | tourTypeDefault | Sets the default tour type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_types | "tourTypeDefault": "full_tour" |
The dateJoiner merges two separate data sources into one data range in a single field in CollectiveAccess.
Refinery | Refinery parameter | Parameter notes | Example |
dateJoiner | mode | Determines how dateJoiner joins date values together. Two-column range (aka "range") is the default if mode is not specified. Options are: multiColumnDate, multiColumnRange, range | "mode": "multiColumnDate" |
dateJoiner | month | Maps the month value for the date from the data source. (For Multi-column date) | "month": "^4" |
dateJoiner | day | Maps the day value for the date from the data source. (For Multi-column date) | "day": "^5" |
dateJoiner | year | Maps the year value for the date from the data source. (For Multi-column date) | "year": "^6" |
dateJoiner | startDay | Maps the day value for the start date from the data source. (For Multi-column range) | "startDay": "^4" |
dateJoiner | startMonth | Maps the month value for the start date from the data source. (For Multi-column range) | "startMonth": "^5" |
dateJoiner | startYear | Maps the year value for the start date from the data source. (For Multi-column range) | "startYear": "^6" |
dateJoiner | endDay | Maps the day value for the end date from the data source. (For Multi-column range) | "endDay": "^7" |
dateJoiner | endMonth | Maps the month value for the end date from the data source. (For Multi-column range) | "endMonth": "^8" |
dateJoiner | endYear | Maps the year value for the end date from the data source. (For Multi-column range) | "endYear": "^9" |
dateJoiner | expression | Date expression (For Two-column range) | "expression" : "^dateExpression" |
dateJoiner | start | Maps the date from the data source that is the beginning of the conjoined date range. (For Two-column range) | "start" : "^dateBegin" |
dateJoiner | end | Maps the date from the data source that is the end of the conjoined date range. (For Two-column range) | "end": "^dateEnd" |
The entityJoiner merges data from two or more data sources (i.e. two columns on a spreadsheet) to make a single entity record. This refinery should be used when last and first names, for example, are in two different locations. NOTE: because the entityJoiner creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)
Refinery | Refinery parameter | Parameter notes | Example |
entityJoiner | entityType | Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found | "entityType": "person" |
entityJoiner | entityTypeDefault | Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types. | "entityTypeDefault": "person" |
entityJoiner | forename | Accepts a constant value for the forename or a reference to the location in the data source where the forename can be found. | "forename":"^3" |
entityJoiner | surname | Accepts a constant value for the surname or a reference to the location in the data source where the surname can be found. | "surname": "^2" |
entityJoiner | other_forenames | Accepts a constant value for the entity's other forenames or a reference to the location in the data source where the other forenames can be found. | "other_forenames": "^10" |
entityJoiner | middlename | Accepts a constant value for the middlename or a reference to the location in the data source where the middlename can be found. | "surname": "^12" |
entityJoiner | displayname | Accepts a constant value for the displayname or a reference to the location in the data source where the displayname can be found. | "displayname": "^14" |
entityJoiner | prefix | Accepts a constant value for the prefix or a reference to the location in the data source where the prefix can be found. | "prefix": "^14" |
entityJoiner | suffix | Accepts a constant value for the suffix or a reference to the location in the data source where the suffix can be found. | "suffix": "^14" |
entityJoiner | attributes | Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found. | "attributes": {"agentDateSet": { "agentDate": "^12"}} |
entityJoiner | nonpreferred_labels | List of non-preferred label values or references to locations in the data source where nonpreferred label values can be found. Use the split value for a label to indicate a value that should be split into entity label components before import. | "nonpreferred_labels":[{"forename":"^5", "surname": "^6"}] OR "nonpreferred_labels":[{"split": "^4"}] |
entityJoiner | relationshipType | Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found | "relationshipType": "^10" or "relationshipType": "author" |
entityJoiner | relationshipTypeDefault | Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system. | "relationshipTypeDefault": "author" |
entityJoiner | skipIfValue | Skip if imported value is in the specified list of values. | "skipIfValue": "unknown" |
The ATRelatedGetter works with the design of the ArchivistToolkit MYSQL database. It maps the repeating source data through the linking table to the correct CollectiveAccess elements.
Refinery | Refinery parameter | Parameter notes | Example |
ATRelatedGetter | table | Archivists Toolkit table. | "table":"ArchDescriptionRepeatingData" |
ATRelatedGetter | discriminator | Discriminator value. | "discriminator":"note" |
ATRelatedGetter | key | Archivists Toolkit key field. | "key":"DigitalObjects.digitalObjectId" |
ATRelatedGetter | map | Map of related table field values to CollectiveAccess element. | "map":{"descriptionText": "^noteContent", "descriptionSource": "repository"} |
Options
Options allow you to set additional formatting and conditionals on data during import. Some Options are designed to actually set parameters on the mapping behavior, such as the skip options. skipGroupIfEmpty, for example, allows you to prevent the import of certain fields, depending on the presence of data in another related field. Other Options simply format data, such as formatWithTemplate, suffix, and convertNewlinesToHTML.
Option | Description | Parameter notes | Example |
skipGroupIfEmpty | If the data value corresponding to this mapping is empty, skip the mappings for the other data in the group | set to a non-zero value | "skipGroupIfEmpty": 1 |
skipGroupIfValue | If the data value corresponding to this mapping contains the set value, skip the mapping for the data and the mappings for the other data in the group | arbitrary value | "skipGroupIfValue": ["n/a"] |
skipGroupIfNotValue | If the data value corresponding to this mapping does not contain the set value, skip the mapping for the data and the mappings for the other data in the group | arbitrary value | "skipGroupIfNotValue": ["n/a"] |
skipRowIfEmpty | If the data value corresponding to this mapping is empty, do not import the row | set to a non-zero value | "skipRowIfEmpty": 1 |
skipRowIfValue | If the data value corresponding to this row contains the set value, do not import the entire row | arbitrary value | "skipRowIfValue": ["n/a"] |
skipRowIfNotValue | If the data value corresponding to this row does not contains the set value, do not import the entire row | arbitrary value | "skipRowIfNotValue": ["n/a"] |
refineries | Select the refinery that preforms the correct function to alter your data source as it maps to CollectiveAccess | make a selection from the available refineries | dateJoiner |
original_values | Return-separated list of values from the data source to be replaced. For example photo is used in the data source, but photograph is used in CollectiveAccess. | data source values | sound recording |
replacement_values | Return-separated list of CollectiveAccess list item idnos that correspond to the mapped values from the original data source. For example sound recording (entered in the Original values column) maps to audio_digital, which is entered here in the Replacement values column. | CollectiveAccess list item idnos | audio_digital |
default | Value to use if data source value is blank. | CollectiveAccess list item idnos | "default":"mixed" |
delimiter | Delimiter to split repeating values on. | delimiter value | "delimiter":";" |
restrictToTypes | Restricts the the mapping to only records of the designated type. For example the Duration field is only applicable to objects of the type moving_image and not photograph. | CollectiveAccess list item idnos | "restrictToTypes":["photograph", "other", "mixed", "text"] |
formatWithTemplate | Formats a field to include words and data values via a set template. | text and data source references | "formatWithTemplate": "Painting #^15 created by ^2" |
suffix | Appends a text value to the end of the data value. | arbitrary text | "suffix": " tons" |
excludeToTypes | Not implemented. This possible option would be the inverse of restrictToTypes. | "excludeToTypes":["photograph", "text"] | |
maxLength | The maximum length, in characters, to allow. Values exceeding this length will be truncated to the maximum. | "maxLength":100 | |
errorPolicy | Determines how errors are handled for the mapping. "Stop" will halt the entire import on any error for this mapping. | ignore; stop | ignore |
convertNewlinesToHTML | Convert newlines to HTML <BR/> tags in imported text. | Value should be 0 or 1. Default is 0 – don't convert text. | "convertNewlinesToHTML":"1" |
Original Values and Replacement Values
In some cases, you may wish for the mapping to find certain values in your source data and replace them with new values. In the Original Value column, state all values that you wish to have replaced. Then, in the Replacement Value column, set their replacements. You can add multiple values to a single cell, so long as the replacement value matched the original value line by line.
This feature can be used to correct common misspellings in the source data, or to simply normalize names and terms, or to conform to element codes for fixed list values in CollectiveAccess.
Running a mapping
To run the import from the command line, follow these instructions.
Before you begin it’s a good idea to make an area for your import mappings and data that’s easily accessible without an inconveniently-long file path. For the sake of this example our import material will live in a Providence directory at:
/support/project/mappings
and
/support/project/data
You'll also want to back-up your database
mysqldump -u#name -p#password project > ~/project_date.dump
The import is executed through caUtils. To see all utilities ask for help after cd-ing into support
cd /path_to_Providence/support
bin/caUtils help
To get further details about the load-import-mapping utility:
bin/caUtils help load-import-mapping
You’ll see that the only load-import-mapping command is the one that designates the mapping file. To load the mapping:
bin/caUtils load-import-mapping --file=project/mappings/mapping1.xlsx
Next you’ll be using the utility import-data. As you’ll see from
bin/caUtils help import-data
there are several options that allow you to designate the format, data source, log preferences, etc. To run the import:
bin/caUtils import-data --format=XLSX --mapping=mapping1 --source=project/data/Data.xlsx --log=project/log
With the PHP ncurses extension installed a display will provide moving status indicators including import progress and recent errors.
To modify your import and rerun the utility, simply restore your database
mysql -u#name -p#password project < ~/project_date.dump
and start the process again!
Common Problems
This section is intended to collect the most common mapping mistakes (and their solutions).
ca_object_lots - When mapping data the the Lots table, remember that Lots require more than just a label and type. They also have a non-optional "lot_status_id" that is a value taken from the "object lot statuses" list. It must be set to a valid value or the lot will fail to insert, and will throw an error message. So be sure set a constant value to ca_object_lots.lot_status_id. You can take this value from the list "object lot statuses" under Manage --> Lists & Vocabularies. Also, remember that id numbers for ca_object_lots don't map to the normal idno but rather ca_object_lots.idno_stub.
Groups - When mapping to to more than one metadata element that are inside of single Container, don't forget to use the Group column to tie each mapping to the same container. You can do this by assigning an arbitrary "group name" to each mapping.
Open/LibreOffice Spreadsheets - If you are loading data from a spreadsheet created by Open/LibreOffice there is a problem with the date detector and you should save the file as an ODF Spreadsheet (*.ods). You can still use the XLS / XLSX importers with these spreadsheets. The data mapping works fine in XLSX format.