Cookbook Chapter 3: Data Import

From CollectiveAccess Documentation
Jump to: navigation, search

Data import

Importing a constant value

Problem
You want a list called "Language" in your CollectiveAccess system to be set as "English" for all records brought in through your import.
Solution
In the Rule Type column for that mapping row, chose Constant. In the Source column set the value, English, by using the unique list code for that item (i.e. "eng").
See also
Data Import: Creating and Running a Mapping (Settings and Constants)
Data Importer (Rule Types)

Grouping data

Problem
You want to map several fields to placements inside a single container.
Solution
Select an arbitrary group name and use it in the "Group" column of your mapping. Any row that includes that name will be linked inside a single placement of a container.
See also
Data Importer (Groups)
Data Importer (Common Problems)

Conditionally skipping data

Problem 1
You want to prevent import of the value "n/a" which has been input instead of data in some cases.
Solution 1
Use the "skipGroupIfValue" option. In this case, {"skipGroupIfValue": ["n/a"]}
Problem 2
You want to skip a whole row of data if a specific value is used in a particular column.
Solution 2
Use the "skipRowIfValue" option. Alternately, use the "skipRowIfNotValue" option.
Problem 3
You want to skip any values that may be in a "Date Type" column if there is no "Date" data.
Solution 3
Create a Group that binds the two columns and use the "skipGroupIfEmpty" option for the "Date" data.
See also
Data Importer (Groups)
Data Importer (Options)

Formatting a Title

Problem
You want to create titles for the records you're importing based on a set format.
Solution
Use the "formatWithTemplate" option. For example: {"formatWithTemplate": "Oral History #^15 with Interviewee ^12"} where ^15 and ^12 are references to columns in the data source where identifiers and entity names can be found.
See also
Data Importer (Options)
Data Import: Creating and Running a Mapping (Options)

Importing a list

Problem
You want to import a list, but the values in your data don't 100% match your values in CollectiveAccess.
Solution
Use the "Original values" and "Replacement values" columns in your mapping. For example if your data source includes "Y" and "N" but your CollectiveAccess list codes are "yes" and "no," simply input those values on your mapping with a line breaks (returns) between each value per column.
See also
Original Values and Replacement Values

Importing to yes_no_checkboxes

Problem
You want to import Yes/No values to a list element rendered as a yes_no_checkbox.
Solution
Unlike lists rendered in other visual formats, the yes_no_checkbox treats the first value in the list as “checked” and the second as “not checked.” Therefore the yes_no_checkbox relies on the order of the list items to determine the visual state of the control, so list sorting matters. Be sure to change the sorting for your Yes/No list to “by value” and then change the item_value of “yes” to “0_yes” and “no” to “1_no” so it would sort with “yes” first. This will ensure that "Yes" values in your source data transform to "checks" in the target element. Of course, use original/replacement values to transform source data to list value item codes, if necessary.

Foreign Key Constraint in Object Lots

Problem
Your object lot import failed. You may have received the error: Could not insert new record Cannot add or update a child row: a foreign key constraint fails (`project`.`ca_object_lots`, CONSTRAINT `fk_ca_object_lots_lot_status_id` FOREIGN KEY (`lot_status_id`) REFERENCES `ca_list_items` (`item_id`))
Solution
Object lots have a non-optional "lot_status_id" that is a value taken from the "object lot statuses" list. It must be set to a valid value or a constant value, mapping to ca_object_lots.lot_status_id.
See also
Data Importer (Common Problems)

Mapping Object Lot Id Numbers

Problem
You want to map object lot id numbers (idnos).
Solution
Id numbers for ca_object_lots don't map to the normal ca_object_lots.idno but rather ca_object_lots.idno_stub.
See also
Data Importer (Common Problems)


Mapping serial ID numbers from multipart_id_numbering.conf

Problem
You want to import a set of data that needs to be automatically numbered according to your settings in multipart_id_numbering.conf.
Solution
Set your mapping as follows:

Rule Type: Constant Source: % Target: ca_table.idno (as in ca_objects.idno, ca_object_lots.idno_stub)

If the idno has more than one component, you can use more than one "%" placeholder (%.%)

Relating Lot records to Objects in an Objects Mapping

Problem
You are importing Object records and wish to relate them to their respective Lot record in the mapping.
Solution
Unlike Entities, Lists, Collections, and other Primary Types, you do not use a Splitter to relate Lot records to Objects in a mapping. Rather, set the source to the Lot id and the ca_table.element to: "ca_objects.lot_id". This works because an Object can be associated with one and only one Lot.

Importing Variable Relationship Types

Problem
You want to define a relationship type in a refinery parameter, but you realize that there is more than one type in your source column or field.
Solution
Instead of writing "relationshipType":"creator" or something else singular, use "relationshipType":"^2" in which the caret is followed by the data source from which you wish to draw relationship types (in this case "2" is just an example).
See Also
Data Import: Creating and Running a Mapping (Parameters)

Creating One Entity Record from Separate Sources

Problem
An entity's name is split up into two different columns in a source spreadsheet, but you want to merge both to create a single entity record
Solution
Use the entityJoiner refinery in your mapping, being sure to include full container paths in the attributes parameter (since you'll be creating a new record). Parameters include entityType, entityTypeDefault, forename, surname, other_forenames, middlename, display name, prefix, suffix, attributes, nonpreferred_labels, relationshipType, relationshipTypeDefault, and skipIfValue.
See Also
Data Import: Creating and Running a Mapping (Parameters)

Building a Hierarchy Above Related Records

Problem
You're trying to import related collections using the collectionSplitter, but you want to build a hierarchy above those records through a parameter.
Solution
Use the collectionSplitter refinery with the parameter "Parents." This will build parent levels above the record that is laterally related to the imported data. In other words, if you're importing items that are laterally related to files, and you then need to build series above the files you're creating via the collectionSplitter, you would use the "parents" parameter. "Parents" includes several sub-parameters, including idno, name, type, attributes, and rules. A Parents parameter may look like this:
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]
}

Note: the above example uses XML as its source.

See Also
Data Import: Creating and Running a Mapping (Parameters)

Building a Hierarchy with Variable Amount of Levels

Problem
You are importing storage locations from an Excel document and each level is in a different column but sometimes the hierarchy spans 5 columns (Building A | Floor 2 | Room A | Cabinet A9 | Drawer 29), while other times it's only 3 columns deep (Building A | Floor 3 | Open Storage Area). For the case of 3 columns you don't want to import 2 blank levels, but rather would like to treat "Open Storage Area" as the subject of the mapping (as Drawer 29 is for the 5 column example). The value of this approach (beyond handling the blank levels) is that the subject level will be the target of the general mapping. This allows for the mapping of other relationships (i.e. the objects stored at the location) to whatever the "lowest" level happens to be.
Solution
Use the useParentAsSubject option along with a storageLocationHierarchyBuilder refinery. In this example, the last level before the first blank level will be the target for the objectSplitter. Make sure to map the storageLocationHierarchyBuilder to ca_storage_locations.parent_id, rather than just ca_storage_locations.

ParentSubject.png

See Also
Data Import: Creating and Running a Mapping (Parameters)

Importing a Hierarchy Within the ca_collections Table

Problem
You want to build a collections hierarchy when importing to the table ca_collections.
Solution
Use the collectionHierarchyBuilder refinery with the parameter "parents." This will map parent levels above the imported data. It can be used to map more than one level, for example a series above a file and a collection above a series, all at once. For example:
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]
}

The parent parameter includes several sub-parameters, as you can see above, such as idno, name, type, attributes, and rules.

See Also
Data Import: Creating and Running a Mapping (Parameters)

Using Rules

Problem
You want to conditionally skip data whenever a certain element appears in the data source. Any time a record's description says "do not use," for example, you want to skip that entire record.
Solution
Use "Rules" to set an action that will be triggered by the presence of a certain value. To do this, use expression statements to create the trigger. For example, if you wish to skip a record containing the phrase "do not use," you must first create the expression statement that denotes "do not use" and indicates that it is to be found in the "description" source. In this case, you could use a regular expression operator for "do not use": =~/do not use/. This will return the text "do not use" as true. Then, to complete the expression statement, add the variable (let's say that "description" is column 5 in an excel spreadsheet). The expression would then be: (^5=~/do not use/). Once the rule trigger is set, you can set the resultant action - in this case, "SKIP." The rule, then would be:
Rule Triggers: (^5=~/do not use/)
Rule Action: SKIP
See Also
Rules
Expressions

Including Metadata in a Parameter

Problem
You are using an entitySplitter, and you want to use the parameter to import address information about the entity
Solution
Use parameter "attributes" and use the following syntax to include metadata within a container:
"attributes":{"address":{"address1":"^24", "address2":"^25","city":"^26", "stateprovince":"^27", "postalcode":"^28", "country":"^29"}}}

in which "address" is the container, all subelements are contained within internal curly brackets, and carets are followed by the source column number.

See Also
Data Import: Creating and Running a Mapping (Parameters)

Mapping Measurements from one Source to Separate Fields

Problem
All of the dimensions data in your source are in the same column, but you want to map them to separate fields in CollectiveAccess.
Solution
Use the measurementsSplitter refinery to divide the dimensions into fields of the dataType Length or Weight. Use the "delimiter" refinery parameter to separate the measurement values, use "units" to specify the unit of measurement, use "elements" to map the components of the dimensions to their respective fields, and use "attributes" to include any other elements (such as a notes field) that may be in a measurements container.
"elements": [{"quantityElement":"measurementWidth", "typeElement": "measurementsType",
"type":"width"},{"quantityElement":"measurementHeight", "typeElement":
"measurementsType2","type":"height"}] OR "elements": [{"quantityElement":"measurementsWidth"},
{"quantityElement":"measurementsHeight"}]
See Also
Data Importer: Refineries

Setting a Unit Specifier for Mapping Dimensions

Problem
You are mapping dimensions data but the unit specifier is not set within each data cell, but rather declared in a data column header.
Solution
Use the suffix formatting under "Options" in your mapping to set the unit specifier for all Dimensions in the source column:
{"suffix": "cm"}
{"suffix": "in"}

If some of the cells are empty, be sure to set a default value so that you don't get an error in which the "length" attribute type doesn't recognize a suffix attached to nothing:

{"suffix": "cm", "default":"0"}

Using a Hierarchical Delimiter for Storage Locations

Problem
The storage locations in your source are expressed only with numbers, 4.2.1 where 4 indicates a room, 2 indicates a rack, and 1 indicates a cabinet.
Solution
Use the storageLocationSplitter refinery with two key parameters that work in tandem: "hierarchicalStorageLocationTypes" and "hierarchicalDelimiter." The hierarchicalStorageLocationTypes adds labels to the numbers in order so that you know what they mean, and the hierarchicalDelimiter tells those labels where to go (as opposed to the regular "delimiter" parameter which would create new records on each delimiter.) In this example, the parameter would be expressed: {"hierarchicalStorageLocationTypes" : ["room", "rack", "cabinet"], "hierarchicalDelimiter":"."}
See also
Data Importer (Refineries)

Importing XML

Problem
You need to import data in an XML file.
Solution
Format-specific data reader code plugins are required to import XML formats. As of version 1.4 two XML formats are supported:
  1. FMPDSORESULT (Filemaker Pro XML data export format)
  2. InMagic XML (Export format for the InMagic archival application)
If you're working with FMPDSORESULT or InMagic XML, set the mapping document's inputType to "FMPDSO" or "Inmagic" respectively and format your source data as /xml_tag in place of <xml_tag>.
If you need to work with some other XML-based format, you'll need to develop a data reader plugin for it. For most formats you can start by copying the FMPDSORESULT plugin (in app/lib/ca/Import/DataReaders/FMPDSOResultReader.php) to a new file in app/lib/ca/Import/DataReaders/ with the name of the new format + "Reader.php" Then change the class name and specifics in the copy to align with your new format.
If you need assistance implementing a new import format please post your request on the support forum at [1]

Importing MARC

Problem
You are importing a MARC database, rather than XLSX or XLS.
Solution
Set the mapping document's inputType to "MARC" and format your source data by MARC Rule and Subfield as "rule/subfield" (ex. 035/a) and ignore indicators, if you choose.
If you do need to use MARC indicators, you append them after the sub-field and another '/'.
Example:
100/a (no indicators)
100/a/x (indicator 1=x)
100/a/xy (indicator 1=x; indicator 2=y)
A concrete example:
MARC:
245 18$aThe ... annual report to the Governor.
The Import mapping source would be:
245/a/18 (as in rule/subfield/indicator1indicator2).

Mapping a MARC element with multiple sub-fields

Problem
What about mapping MARC elements that contain multiple sub-fields?
Solution
Sub-fields are denoted by the "$" sign, which can be ignored in the mapping document. Use display formatting to map a MARC element with multiple sub-fields to a single metadata element.
For example:
245 10$aTrade Union Fellowship Program :$b[announcement].
Here, the source is set to 245/a, and the following format is set in options:
{"formatWithTemplate": "^245/a  ^245/b"}

Importing Indented List Items

Problem
You are trying to import a hierarchical list from an Excel spreadsheet that uses indentations (empty cells) to display the hierarchy:

Indent.png

Solution
Use the listItemIndentedHierarchyBuilder refinery. You can use this to import the list on its own, as a vocabulary, or as metadata attached to Objects, etc. The parameters for this refinery include "levels" (to indicate source columns), "levelTypes" (to define hierarchy levels), "mode" (either "returnData" or "processOnly"). An example JSON for the sample above would be:
{"list": "categories", "levels":["^1", "^2", "^3"], "levelTypes":["concept", "concept", "concept"], "mode": "processOnly"}
See also
Data Importer (Refineries)

Values Importing Improperly from Excel

Problem
You're importing data from an Excel spreadsheet; the document looks normal, but when it's imported text fields seem to render as dates.
Solution
There is hidden formatting in your Excel spreadsheet; this is a common problem and can be responsible for a variety of import errors. Open the file in Excel, select all cells, and then select "Clear -> Formats" from the "Edit" menu. Save, and import the new copy of the file.

sphinx

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools