Data Importer/fr

From CollectiveAccess Documentation
Revision as of 14:04, 14 October 2013 by FuzzyBot (talk | contribs) (Updating to match new version of source page)
Jump to: navigation, search

<languages/> [Disponible dans la v1.4]

Résumé

Avec la sortie de la version 1.4 de CollectiveAccess, les utilisateurs pourront réaliser leurs propres correspondances et migrer leurs sources de données directement depuis la ligne de commande ou depuis l'interface utilisateur de Providence (dans le menu "Import > Données"). Lancer un import de données nécessite de suivre 7 étapes, détaillées ci-dessous.

1 - Un utilisateur doit créer un document de correspondance qui servira de rapprochement entre la source de données et la destination des données dans CollectiveAccess.

2 - Avant de lancer un import, il faut absolument sauvegarder la base de données en exécutant un dump de la base.

3 - Lorsque vous avez réalisé une sauvegarde et que votre votre document de correspondance, il faut ensuite lancer l'import soit depuis la ligne de commande soit depuis l'interface graphique.

4 - Lorsque la migration a été exécutée, vérifiez les données dans CollectiveAccess à la recherche d'erreurs ou d'incohérences.

5 - Corrigez votre document de correspondance en fonction des erreurs trouvées.

6 - Rechargez le dump de la base pour que le système revienne dans son état avant l'import (et ainsi écraser les données que vous allez à nouveau importer.)

7 - Lancer l'import à nouveau.

Formats acceptés en entrée

Actuellement : XLSX, XLS, MYSQL, Filemaker XML et Inmagic XML

A venir sous peu : tab delimited, CSV, MARC, OAI-PMH/DC,

Créer une correspondance

Pour créer une correspondance, commencez par télécharger le modèle de correspondance d'import pour Excel disponible ici. (File:Data Import Mapping template.xlsx). Une fois que toutes les correspondances ont été saisies dans le modèle, il peut être chargé directement dans CollectiveAccess (Voir traiter une correspondance plus bas) et la page Créer et traiter une correspondance.

Ce fichier de correspondance fonctionne selon deux principes acquis sur vos données. 1 - que chaque ligne dans la source de données correspond à un enregistrement unique, et 2 - que chaque colonne correspond à une seule métadonnée.

Ci-dessous nous allons parcourir chaque colonne du fichier de correspondance. Dans la première colonne, vous devez définir le type de règle (Rule type) pour chaque colonne de donnée dans votre source. Consultez le tableau ci-dessous pour la description de chaque règle.

Types de règle (Rule types)

Rule type Description
Mapping Fait correspondre une colonne de la source de données (ou table.champ) vers une métadonnée de CollectiveAccess. Les correspondance peuvent utiliser des traitements d'affinage (Refineries), voir plus bas.
SKIP Utiliser SKIP pour ignorer une colonne de la source de données ou un table.champ.
Constant Défini une colonne de la source de données (ou table.champ) à une valeur constante et arbitraire. Définir la valeur dans la colonne Source du fichier de correspondance. Correspond au champ CollectiveAccess idno pour un élément de liste.
Setting Définit des préférences pour la correspondance (voir ci-dessous).

Paramètres

Paramètre Description Notes Exemple Actuellement supporté ?
name Nom de la correspondance Texte arbitraire Ma correspondance Implémenté
code Code alphanumérique de la correspondance Arbitraire, pas de caractères spéciaux ni d'espaces ma_correspondance Implémenté
inputTypes Défini quels types de sources (format) de données sont gérés par cette correspondance. Les valeurs sont les codes de format géré par les différents plugin DataReader. Vous pouvez spécifiez des formats multiples par leurs codes dans une liste séparée par des points-virgules. Les codes actuellement supportés sont : XLSX, MYSQL. A venir : CSV, TAB, MARC. Si vous ne remplissez pas ce paramètre, la correspondance est supposée valide pour toutes les sources de données, ce qui est très peu probable. type de fichier XLSX;CSV;TAB Implémenté
table Défini la table correspondant aux données importées Correspond aux tables de la base CollectiveAccess ca_objects Implémenté
type Type à définir pour les enregistrements importés. Si l'import inclut une correspondance vers le champ type_id, celle-ci sera privilégiée et le paramètre type sera ignoré. identifiant d'élément de liste CollectiveAccess posters Implémenté
numInitialRowsToSkip Le nombre de lignes à ignorer en début de fichier. Utilisez ce paramètre pour ignorer différentes lignes correspondant à des entêtes de colonne au début du fichier de données d'un tableur valeur numérique 2 Implémenté
existingRecordPolicy Détermine comment les enregistrements déjà présents dans CollectiveAccess sont vérifiés et traités par la correspondance. Défini aussi comment les enregistrements créés par la correspondance sont ensuite fusionnés avec d'autres instances (par idno et/ou par labels préférés) de la source de données. none; skip_on_idno; merge_on_idno; overwrite_on_idno; skip_on_preferred_labels; merge_on_preferred_labels; overwrite_on_preferred_labels; skip_on_idno_and_preferred_labels; merge_on_idno_and_preferred_labels; overwrite_on_idno_and_preferred_labels none Implémenté
errorPolicy Détermine comment les erreurs sont gérés pour cet import. "Stop" arrêtera l'import à la première erreur. ignore; stop ignore Implémenté
archiveMapping Définir à "yes" pour enregistrer la correspondance dans la base ; non pour la supprimer après l'import yes; no yes A venir
archiveDataSets Définir à "yes" pour enregistrer les données source ou à "no" pour les effacer du serveur après l'import yes; no yes A venir

Source

La colonne source est utilisée pour définir précisément quel élément de la source de données doit être traitée ou ignorée. Vous pouvez aussi définir une valeur constante plutôt qu'une correspondance, en définissant le type de règle sur "Constant" et la colonne Source à la valeur (ex : "bon état" pour une métadonnée texte) ou l'idno de l'élément de liste (ex : "code_bon_etat" pour une métadonnée liée à une liste) correspondant à votre configuration CollectiveAccess.

Tableurs
Si vous voulez faire correspondre la colonne B d'un tableur Excel, vous devrez définir la Source sur le numéro 2 (A = 1, B = 2, C = 3, et ainsi de suite). Pour cette correspondance, le contenu de la colonne B sera rapatriée. Si d'un autre côté, vous voulez ignorer cette colonne, vous devrez définir la valeur Rule Type à "Skip" et la valeur de la colonne Source à 2.

XML
Définir la colonne source du nom du tag XML, précédé d'un slash (par exemple /Sponsoring_Department or /inm:ContactName)

MARC
La valeur Source pour les fichiers MARC supporte les champs et les indicateurs. Par exemple : 100/a (champ=100; pas d'indicateur) 100/a/x (indicateur 1=x) 100/a/xy (indicateur 1=x; indicateur 2=y). Pour les sous-champs, formater le texte à importer avec l'option formatWithTemplate, par exemple {"formatWithTemplate": "^245/a ^245/b ^245/f ^245/g ^245/h ^245/k ^245/n ^245/p ^245/s"}

CA table.element

You declared the data source in the previous column. Now, it's time to set the destination. Using the CA table.element column, set where you wish the Source data to be mapped to in CollectiveAccess. If you are setting the Source to Skip, of course, you do not need to complete this step. If you are mapping data or applying a constant value, you do need to set the destination. This is accomplished by writing the ca_table.element_code.

CA Table corresponds to the CollectiveAccess basic tables, while element_code is simply the unique code you assigned to a particular metadata element in your CA configuration.

Par exemple, pour faire correspondre une colonne titre de votre source de données dans CollectiveAccess, vous devez définir table.element de la manière suivante :

ca_objects.preferred_labels

Ceci va faire correspondre les données du champ de votre source indiquée vers le champ Titre d'un enregistrement Objet dans CollectiveAccess.

Groupe

In many cases, distinct lines in a data set will map into their corresponding metadata elements that happen to be bundled together inside of a single container. For example, a common container is Date, wherein there are actually two metadata elements - one for the date itself, and the other a drop-down menu to declare the date's type (Date Created, Date Accessioned, etc.)

Imaginons dans votre source de données, il y ait un colonne qui contienne des valeurs de date, alors que la colonne d'à côté contient les types de dates.

If the corresponding metadata elements in CA are bundled into a container, you must tell this to the mapping document by placing these Source elements into a group. Otherwise, the date value would be mapped to one container, while the date type would be mapped to another container (and each would be missing their counterpart!)

Déclarer un Groupe est très simple. Il suffit d'aligner un nom à chaque ligne pour laquelle une correspondance doit être faite dans un seul conteneur.

Si la source "2" correspond à

Post-traitement

A l'heure actuelle, il existe dans CA 4 types de post-traitement : Splitters (diviseurs), Makers (créateurs), Joiners (assembleurs) et Getters (rapatrieurs). Chaque traitement est créé pour prendre un format spécifique et le transformer suivant un comportement prédéfini pour l'importer ensuite dans CollectiveAccess.

Splitters
Splitter refineries can either create records, match data to existing records (following a mapping's existingRecordPolicy) or break a single string of source data into several metadata elements in CollectiveAccess. Splitters for relationships are used when several parameters are required, such as setting a record type and setting a relationship type. Using the entitySplitter, a name in a single location (i.e. column) in a data source can be parsed (into first, middle, last, prefix, suffix, et al.) within the new record. Similarly the measurementSplitter breaks up, for example, a list of dimensions into to a CollectiveAccess container of sub-elements. "Splitter" also implies that multiple data elements, delimited in a single location, can be "split" into unique records related to the imported record.

Makers
Maker refineries are used to create CollectiveAccess tour/tour stop, object lot/object and list/list item pairings. These relationships are different than other CollectiveAccess relationships for two reasons. Firstly, they don't carry relationship types. Secondly, these relationships are always single to multiple: a tour can have many tour stops, but a tour stop can never belong to more than one tour. Similarly an object can never belong to more than one lot. List items belong to one and only one list. The Maker refinery is used for these specific cases where "relationshipType" and other parameters are unnecessary.

Joiners
In some ways Joiners are the opposite of Splitters. An entityJoiner refinery is used when two or more parts of a name (located in different areas of the data source) need to be conjoined into a single record. The dateJoiner makes a single range out of two or more elements in the data source.

Getters
Getters are designed specifically for MYSQL data mappings. These refineries map the repeating source data through the linking table to the correct CollectiveAccess elements.


The entitySplitter creates an entity record or finds an exact match (on name) and creates a relationship. Breaks up parts of a name, sets type and other paramaters. NOTE: because the entitySplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
entitySplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
entitySplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found {"relationshipType": "^10"} or {"relationshipType": "author"}
entitySplitter entityType Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found {"entityType": "person"}
entitySplitter attributes Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "address": {
           "address1": "^24",
           "address2": "^25",
           "city": "^26",
           "stateprovince": "^27",
           "postalcode": "^28",
           "country": "^29"
       }
   }

}

entitySplitter attributes:idno To map source data to idnos in an entitySplitter, see the 'attributes' parameter above. An exception exists for when idnos are set to be auto-generated. To create auto-generated idnos within an entitySplitter, use the following syntax. "attributes": {"idno":"%"}
entitySplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"creator"}
entitySplitter entityTypeDefault Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types {"entityTypeDefault":"individual"}
entitySplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

entitySplitter relatedEntities This allows you to create and/or relate additional entities to the entity being mapped. For example, if you are running an Object mapping and using an entitySplitter to generate related Individuals, but you also want to create entity records for each individual's affiliation, use this setting. "Name" is the name of the entity, which will be automatically split into pieces and imported. If you want to explicitly map pieces of a name (surname, forename) you can omit "name" and use "forename", "middlename", "surname", etc. "type", "attributes," and "relationshipType" operate just as they would in a regular splitter. {"relatedEntities": [{"type":"ind", "name": "^3", "attributes":{}, "relationshipType":"related"}]}
entitySplitter nonPreferredLabels Maps source data cells to ca_entities.nonpreferred_labels of the entity being generated or matched by the entitySplitter "nonPreferredLabels": [{"forename": "^5", "surname":"^6"}]

The collectionSplitter creates a collection record or finds an exact match (on name) and creates a flat relationship. The collectionSplitter can only be used for flat (regular) relationships, not hierarchical relationships. For hierarchical relationships, use the collectionHierarchyBuilder refinery. NOTE: because the collectionSplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
collectionSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
collectionSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note (for object data): if the relationship type matches that set as the hierarchy control, the object will be pulled in as a "child" element in the collection hierarchy {"relationshipType": "part_of"}
collectionSplitter collectionType Accepts a constant list item idno from the list collection_types or a reference to the location in the data source where the type can be found {"collectionType": "box"}
collectionSplitter attributes Sets or maps metadata for the collection record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "collectionDateSet": {
           "collectionDate": "^12"
       }
   }

}

collectionSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"part_of"}
collectionSplitter collectionTypeDefault Sets the default collection type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list collection_types {"collectionTypeDefault":"series"}
collectionSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

collectionSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The collectionHierarchyBuilder is just like the collectionSplitter except for one key difference: instead of creating flat records it creates a hierarchical relationship between the import data and the parent record(s) created by the refinery. Like the collectionSplitter the collectionHierarchyBuilder first looks to make an exact match (on name) and if none are found it creates the necessary record(s). NOTE: because the collectionHierarchyBuilder creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
collectionHierarchyBuilder parents Maps the parent levels that should be built or matched hierarchical above the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^/inm:SeriesNo",
           "name": "^/inm:SeriesTitle",
           "type": "series",
           "attributes": { "ca_collections.description": "^7"}
       },
       {
           "idno": "^/inm:CollectionNo",
           "name": "^/inm:CollectionTitle",
           "type": "collection",
           "rules": [
               {
                   "trigger": "^/inm:Status = 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_collections.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

The placeSplitter creates a place record or finds an exact match (on name) and creates a relationship. NOTE: because the placeSplitter creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
placeSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
placeSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
placeSplitter placeType Accepts a constant list item idno from the list place_types or a reference to the location in the data source where the type can be found {"placeType": "country"}
placeSplitter attributes Sets or maps metadata for the place record by referencing the metadataElement code and the location in the data source where the data values can be found {
   "attributes": {
       "placeNote": "^12"
   }

}

placeSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"location"}
placeSplitter placeTypeDefault Sets the default place type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list place_types {"placeTypeDefault":"country"}
placeSplitter hierarchy Identifies the list code in the place_hierarchies list for the relevant place hierarchy. {"hierarchy": "dc"}
placeSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The measurementsSplitter formats data values that are mapped to an element of the datatype Length or Weight. NOTE: the measurementsSplitter does not create new records (it only maps data) and as a result the full container paths must not be specified in the attributes/elements parameter (i.e. just use subElement_code or measurementsWidth)

Refinery Refinery parameter Parameter notes Example
measurementsSplitter delimiter Sets the value of the delimiter to break on, separating measurement values {"delimiter": "x"}
measurementsSplitter units set to value of the measurement unit {"units": "in"}
measurementsSplitter elements maps the components of the dimensions to specific metadata elements {
   "elements": [
       {
           "quantityElement": "measurementWidth",
           "typeElement": "measurementsType",
           "type": "width"
       },
       {
           "quantityElement": "measurementHeight",
           "typeElement": "measurementsType2",
           "type": "height"
       }
   ]

}
Note: the typeElement and type sub-components are optional and should only be used in measurement containers that include a type drop-down.

measurementsSplitter attributes maps the other non-measurement elements that may be in the same container. Values here are set for all measurements being split. {
   "attributes": {
       "notes": "^1"
   }

}

The listitemSplitter creates a list item or finds an exact match (on name) and creates a relationship. NOTE: the listitemSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
listItemSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
listItemSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
listItemSplitter listItemType Accepts a constant list item idno from the list or a reference to the location in the data source where the type can be found. {"listItemType": "concept"}
listItemSplitter attributes Sets or maps metadata for the list value by referencing the metadataElement code and the location in the data source where the data values can be found. You usually don't set attributes for a list item, but you can here if you need to. {
   "attributes": {
       "listItemNote": "^12"
   }

}

listItemSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^12",
           "name": "^14",
           "type": "concept",
           "attributes": { "ca_list_items.description": "^7"}
       },
       {
           "idno": "^16",
           "name": "^17",
           "type": "guide",
           "rules": [
               {
                   "trigger": "^3= 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_list_items.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

listItemSplitter list Enter the list_code for the list that the item should be added to. This is mandatory - if you forget to set it or set it to a list_code that doesn't exist the mapping will fail.) {"list": "list_code"}
listItemSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"concept"}
listItemSplitter listItemTypeDefault Sets the default list item type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list list_item_types {"listItemTypeDefault":"concept"}

The tourStopSplitter creates a tour stop or finds an exact match (on name) and creates a relationship. NOTE: the tourStopSplitter creates new records and as a result full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
tourStopSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter": ";"}
tourStopSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. {"relationshipType": "location"}
tourStopSplitter tourStopType Accepts a constant list item idno from the list tour_stop_types or a reference to the location in the data source where the type can be found. {"tourStopType": "main_stop"}
tourStopSplitter attributes Sets or maps metadata for the tour stop record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "stopDescription": "^11"
   }

}

tourStopSplitter tour Identifies the tour to add the stop to. {"tour": "tour_code"}
tourStopSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values don't match any values in the CollectiveAccess system {"relationshipTypeDefault":"location"}
tourStopSplitter tourStopTypeDefault Sets the default tour stop type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_stop_types {"tourStopTypeDefault":"main_stop"}

The storageLocationSplitter creates a new entry in the storage locations hierarchical list.

Refinery Refinery parameter Parameter notes Example
storageLocationSplitter hierarchicalStorageLocationTypes Sets the storage location types used to label each level in a numerically expressed hierarchy {
   "hierarchicalStorageLocationTypes": [
       "room",
       "rack",
       "cabinet"
   ]

}

storageLocationSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter":";"}
storageLocationSplitter hierarchicalDelimiter Specifies the delimiter to on which to break when designating hierarchicalStorageLocationTypes {"hierarchicalDelimiter":"."}
storageLocationSplitter parents Maps or builds the parent levels above the record laterally related to the imported data. The parent parameter has several sub-parameters including:

idno: maps the level-specific idno

name: maps the level-specific preferred_label

type: maps the level-specific record type (must match the item idno exactly)

attributes: maps the (optional) level-specific metadata. Includes the metadataElement code and the data source.

rules: maps any (optional) level-specific rules .
{
   "parents": [
       {
           "idno": "^10",
           "name": "^12",
           "type": "cabinet",
           "attributes": { "ca_storage_location.description": "^7"}
       },
       {
           "idno": "^14",
           "name": "^16",
           "type": "room",
           "rules": [
               {
                   "trigger": "^18= 'in progress'",
                   "actions": [
                       {
                           "action": "SET",
                           "target": "ca_storage_locations.status",
                           "value": "edit"
                       }
                   ]
               }
           ]
       }
   ]

}

storageLocationSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

The loanSplitter creates a new loan-in or loan-out record.

Refinery Refinery parameter Parameter notes Example
loanSplitter loanType Accepts a constant list item from the list loan_types {"loanType":"out"}
loanSplitter relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found. Note for object data: if the relationship type matches that which is set as the hierarchy control, the object will be pulled in as a "child" element in the loan hierarchy {"relationshipType":"part_of"}
loanSplitter delimiter Sets the value of the delimiter to break on, separating data source values {"delimiter":"."}
loanSplitter attributes Sets or maps metadata for the loan record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "loanDate": "^11"
   }

}

loanSplitter relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system {"relationshipTypeDefault":"part_of"}
loanSplitter loanTypeDefault Sets the default loan type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list loan_types. {"loanTypeDefault":"in"}
loanSplitter interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}

Le tourMaker crée un parcours parent dans une correspondance créant des points d'un parcours.

Refinery Refinery parameter Parameter notes Example
tourMaker tourType Accepts a constant list item idno from the list tour_types or a reference to the location in the data source where the type can be found. {"tourType": "full_tour"}
tourMaker attributes Sets or maps metadata for the tour record by referencing the metadataElement code or the location in the data source where the data values can be found. {
   "attributes": {
       "tour_code": "^1"
   }

}

tourMaker tourTypeDefault Sets the default tour type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list tour_types {"tourTypeDefault": "full_tour"}


The dateJoiner merges two separate data sources into one data range in a single field in CollectiveAccess.

Refinery Refinery parameter Parameter notes Example
dateJoiner mode Determines how dateJoiner joins date values together. Two-column range (aka "range") is the default if mode is not specified. Options are: multiColumnDate, multiColumnRange, range {"mode": "multiColumnDate"}
dateJoiner month Maps the month value for the date from the data source. (For Multi-column date) {"month": "^4"}
dateJoiner day Maps the day value for the date from the data source. (For Multi-column date) {"day": "^5"}
dateJoiner year Maps the year value for the date from the data source. (For Multi-column date) {"year": "^6"}
dateJoiner startDay Maps the day value for the start date from the data source. (For Multi-column range) {"startDay": "^4"}
dateJoiner startMonth Maps the month value for the start date from the data source. (For Multi-column range) {"startMonth": "^5"}
dateJoiner startYear Maps the year value for the start date from the data source. (For Multi-column range) {"startYear": "^6"}
dateJoiner endDay Maps the day value for the end date from the data source. (For Multi-column range) {"endDay": "^7"}
dateJoiner endMonth Maps the month value for the end date from the data source. (For Multi-column range) {"endMonth": "^8"}
dateJoiner endYear Maps the year value for the end date from the data source. (For Multi-column range) {"endYear": "^9"}
dateJoiner expression Date expression (For Two-column range) {"expression" : "^dateExpression"}
dateJoiner start Maps the date from the data source that is the beginning of the conjoined date range. (For Two-column range) {"start" : "^dateBegin"}
dateJoiner end Maps the date from the data source that is the end of the conjoined date range. (For Two-column range) {"end": "^dateEnd"}

The entityJoiner merges data from two or more data sources (i.e. two columns on a spreadsheet) to make a single entity record. This refinery should be used when last and first names, for example, are in two different locations. NOTE: because the entityJoiner creates new records the full container paths must be specified in the attributes parameter (i.e. ca_table.container_code.subElement_code)

Refinery Refinery parameter Parameter notes Example
entityJoiner entityType Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found {"entityType": "person"}
entityJoiner entityTypeDefault Sets the default entity type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess list entity_types. {"entityTypeDefault": "person"}
entityJoiner forename Accepts a constant value for the forename or a reference to the location in the data source where the forename can be found. {"forename":"^3"}
entityJoiner surname Accepts a constant value for the surname or a reference to the location in the data source where the surname can be found. {"surname": "^2"}
entityJoiner other_forenames Accepts a constant value for the entity's other forenames or a reference to the location in the data source where the other forenames can be found. {"other_forenames": "^10"}
entityJoiner middlename Accepts a constant value for the middlename or a reference to the location in the data source where the middlename can be found. {"surname": "^12"}
entityJoiner displayname Accepts a constant value for the displayname or a reference to the location in the data source where the displayname can be found. {"displayname": "^14"}
entityJoiner prefix Accepts a constant value for the prefix or a reference to the location in the data source where the prefix can be found. {"prefix": "^14"}
entityJoiner suffix Accepts a constant value for the suffix or a reference to the location in the data source where the suffix can be found. {"suffix": "^14"}
entityJoiner attributes Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found. {
   "attributes": {
       "agentDateSet": {
           "agentDate": "^12"
       }
   }

}

entityJoiner nonpreferred_labels List of non-preferred label values or references to locations in the data source where nonpreferred label values can be found. Use the split value for a label to indicate a value that should be split into entity label components before import. {
   "nonpreferred_labels": [
       {
           "forename": "^5",
           "surname": "^6"
       }
   ]

}
OR
{

   "nonpreferred_labels": [
       {
           "split": "^4"
       }
   ]

}

entityJoiner relationshipType Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found {"relationshipType": "^10"}
entityJoiner relationshipTypeDefault Sets the default relationship type that will be used if none are defined or if the data source values do not match any values in the CollectiveAccess system. {"relationshipTypeDefault": "author"}
entityJoiner skipIfValue Skip if imported value is in the specified list of values. {"skipIfValue": "unknown"}
entityJoiner relatedEntities This allows you to create and/or relate additional entities to the entity being mapped. For example, if you are running an Object mapping and using an entityJoiner to generate related Individuals, but you also want to create entity records for each individual's affiliation, use this setting. "Name" is the name of the entity, which will be automatically split into pieces and imported. If you want to explicitly map pieces of a name (surname, forename) you can omit "name" and use "forename", "middlename", "surname", etc. "type", "attributes," and "relationshipType" operate just as they would in a regular splitter. {"relatedEntities": [{"type":"ind", "name": "^3", "attributes":{}, "relationshipType":"related"}]}
entityJoiner interstitial Sets metadata for the Relationship record (between the target of the mapping and the related entity via the splitter) {
   "interstitial": {
       "relationshipDate": "^4"
   }

}


The ATRelatedGetter works with the design of the ArchivistToolkit MYSQL database. It maps the repeating source data through the linking table to the correct CollectiveAccess elements.

Refinery Refinery parameter Parameter notes Example
ATRelatedGetter table Archivists Toolkit table. {"table":"ArchDescriptionRepeatingData"}
ATRelatedGetter discriminator Discriminator value. {"discriminator":"note"}
ATRelatedGetter key Archivists Toolkit key field. {"key":"DigitalObjects.digitalObjectId"}
ATRelatedGetter map Map of related table field values to CollectiveAccess element. {
   "map": {
       "descriptionText": "^noteContent",
       "descriptionSource": "repository"
   }

}

Options

Options allow you to set additional formatting and conditionals on data during import. Some Options are designed to actually set parameters on the mapping behavior, such as the skip options. skipGroupIfEmpty, for example, allows you to prevent the import of certain fields, depending on the presence of data in another related field. Other Options simply format data, such as formatWithTemplate, suffix, and convertNewlinesToHTML.

Option Description Parameter notes Example
skipGroupIfEmpty If the data value corresponding to this mapping is empty, skip the mappings for the other data in the group set to a non-zero value {"skipGroupIfEmpty": 1}
skipGroupIfValue If the data value corresponding to this mapping contains the set value, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfValue": ["n/a"]}
skipGroupIfNotValue If the data value corresponding to this mapping does not contain the set value, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfNotValue": ["n/a"]}
skipGroupIfExpression If the expression yields true, skip the mapping for the data and the mappings for the other data in the group arbitrary value {"skipGroupIfExpression":"^/inm:FileNo <> \"\""}
skipRowIfEmpty If the data value corresponding to this mapping is empty, do not import the row set to a non-zero value {"skipRowIfEmpty": 1}
skipRowIfValue If the data value corresponding to this row contains the set value, do not import the entire row arbitrary value {"skipRowIfValue": ["n/a"]}
skipRowIfNotValue If the data value corresponding to this row does not contains the set value, do not import the entire row arbitrary value {"skipRowIfNotValue": ["n/a"]}
refineries Select the refinery that preforms the correct function to alter your data source as it maps to CollectiveAccess make a selection from the available refineries dateJoiner
original_values Return-separated list of values from the data source to be replaced. For example photo is used in the data source, but photograph is used in CollectiveAccess. data source values sound recording
replacement_values Return-separated list of CollectiveAccess list item idnos that correspond to the mapped values from the original data source. For example sound recording (entered in the Original values column) maps to audio_digital, which is entered here in the Replacement values column. CollectiveAccess list item idnos audio_digital
default Value to use if data source value is blank. CollectiveAccess list item idnos {"default": "mixed"}
delimiter Delimiter to split repeating values on. delimiter value {"delimiter": ";"}
restrictToTypes Restricts the the mapping to only records of the designated type. For example the Duration field is only applicable to objects of the type moving_image and not photograph. CollectiveAccess list item idnos {
   "restrictToTypes": [
       "photograph",
       "other",
       "mixed",
       "text"
   ]

}

formatWithTemplate Formats a field to include words and data values via a set template. text and data source references {"formatWithTemplate": "Painting #^15 created by ^2"}
suffix Appends a text value to the end of the data value. arbitrary text {"suffix": " tons"}
excludeToTypes Not implemented. This possible option would be the inverse of restrictToTypes. {
   "excludeToTypes": [
       "photograph",
       "text"
   ]

}

maxLength The maximum length, in characters, to allow. Values exceeding this length will be truncated to the maximum. {"maxLength":100}
errorPolicy Determines how errors are handled for the mapping. "Stop" will halt the entire import on any error for this mapping. ignore; stop ignore
convertNewlinesToHTML Convert newlines to HTML <BR/> tags in imported text. Value should be 0 or 1. Default is 0 – don't convert text. {"convertNewlinesToHTML":"1"}

Original Values and Replacement Values

In some cases, you may wish for the mapping to find certain values in your source data and replace them with new values. In the Original Value column, state all values that you wish to have replaced. Then, in the Replacement Value column, set their replacements. You can add multiple values to a single cell, so long as the replacement value matched the original value line by line.

This feature can be used to correct common misspellings in the source data, or to simply normalize names and terms, or to conform to element codes for fixed list values in CollectiveAccess.

Transform Values Using Worksheet

Using the Original and Replacement columns is sufficient for transforming a small range of values. But for really large transformation dictionaries, use the option "transformValuesUsingWorksheet" instead.

You can use this option to reference a list of values in a separate worksheet within the Excel mapping document. The formatting of the sheet should place original values in the first column, and replacement values in the second column.

The "transformValuesUsingWorksheet" mapping option takes a worksheet name and will load the first and second columns of that sheet as the original and replacement values respectively. When this option is set, any values in the "original values" and "replacement values" columns of the mapping worksheet are ignored, even if the "transformValuesUsingWorksheet" worksheet is empty or does not exist.

You refer to the sheet by name. So the options syntax in your Excel mapping doc would look like this:

{"transformValuesUsingWorksheet":"Worksheet Title"}

Again, if you need to transform a small number of values, simply use the Original/Replacement columns on the mapping worksheet. If you need to transform a really large list, create an additional worksheet, give it a name, add your original/replacement values in the first two columns, and set the transformValuesUsingWorksheet option.

Rules

Rules allow you to set record-level conditionals in your mapping with target actions triggered by true or false outcomes. With Rules, you can manipulate the migration of specific data and/or set metadata based on expression statements. For example, let's say you want to skip a record if a certain element in your data source is exactly equal to a specific value. Rules allows you to set a target action, such as "SKIP," when a match is triggered.

Rules rely on a two part operation outlined in the import mapping. The first component is is called "Rule triggers" and it is an expression statement that results in a quantity that is evaluated by the data importer. The second part is the "Rule actions" that are performed based on the outcome of the expression.

Let's walk through an example. For an in depth look at writing "Rule triggers" via Expressions, read more here.

For our example, we are going to skip all records with the phrase "do not use" in the description. To do so we write an expression to match "do not use" in the required field, and then set the action to execute when the expression is true to be "SKIP." For the sake of this example we're importing an Excel spreadsheet and the description is in column 5:

This is how the rule should look in the import mapping:

Rule.png

Set "Rule" as your rule type and add the following to the Rule triggers column:

(^5 =~ /do not use/)

Where ^5 references column 5 and =~ invokes the regular expression operator. In the "actions" column is a simple reference to the action:

SKIP

Note that you can potentially add several actions to a single rule trigger by separating the actions with returns. For now, the only possible action is "SKIP", which skips the entire record rather than importing it. A "SET" action, which causes the setting of a field in the import to a specific value, is planned for development soon.

Lancer un import

Pour lancer un import depuis la ligne de commandes, suivez ces instructions

Avant de commencer, il faut mieux trouver un emplacement pour vos correspondances et vos données qui soit facilement accessibles sans utiliser un trop long chemin. Pour la facilité de notre exemple, nos fichiers d'import seront dans un répertoire de Providence :

/support/project/mappings et
/support/project/data

Vous voudrez aussi faire une sauvegarde avant de lancer l'import.

mysqldump -u#name -p#password project > ~/project_date.dump

L'import est lancé à l'aide de caUtils. Pour voir toutes les tâches possibles, appelez l'aide de la commande depuis le répertoire support.

cd /path_to_Providence/support
bin/caUtils help

Pour plus d'informations sur l'utilitaire load-import-mapping :

bin/caUtils help load-import-mapping

Vous verrez que la seule commande load-import-mapping est nécessaire pour indiquer le fichier de correspondance. Pour charger le fichier de correspondance :

bin/caUtils load-import-mapping --file=project/mappings/mapping1.xlsx

Ensuite vous utilisez l'utilitaire import-data. Comme vous pouvez le voir avec

bin/caUtils help import-data

il y a différentes options pour vous permettre de désigner le format, la source de données, les préférences de log, etc. Pour lancer l'import :

bin/caUtils import-data --format=XLSX --mapping=mapping1 --source=project/data/Data.xlsx --log=project/log

L'extension PHP ncurses affichera des indicateurs sur le statut d'import dont la progression d'import et les erreurs récentes.

Pour modifier votre import et relancer l'utilitaire, restaurez simplement la base

mysql -u#name -p#password project < ~/project_date.dump

et recommencez le processus !

Problèmes habituels

Cette section sert à rassembler les erreurs les plus courantes (et leurs solutions).

ca_object_lots - Lors d'une correspondance avec la table des Lots, il faut se souvenir qu'un lot nécessite plus qu'un simple type et un libellé. Ils ont également un "lot_status_id" obligatoire qui est une valeur tirée de la liste des types de lot (Gérer > Listes et vocabulaires). Il doit être défini et fixé à une valeur valide sinon l'import va échouer, et renvoyer un message d'erreur. Soyez donc sur de définir une valeur constante pour ca_object_lots.lot_status_id. Vous pouvez relever cette valeur depuis la liste des "types de lot" via Gérer > Listes et vocabulaires. Notez aussi que les identifiants pour la table ca_object_lots ne va pas dans le champ habituel idno mais à la place dans ca_object_lots.idno_stub.

Groupes - Lorsque vous faites des correspondances à plus d'un élément de métadonnées qui sont dans un seul conteneur, n'oubliez d'utiliser la colonne Group pour les rattacher tous à un seul conteneur global. Vous pouvez faire ceci en définissant un "nom de groupe" identique pour chacun des éléments.

Tableur Open/LibreOffice - Si vous chargez des données depuis une feuille de calcul créée dans Open/LibreOffice, il y a un problème avec le détecteur de date, vous devriez alors enregistrer le fichier au format tableur ODF (*.ods). Vous pouvez utiliser l'import XLS / XLSX avec ces feuilles de calculs. La correspondance fonctionne bien dans un format XLSX.

{"entityType": "entity_type", "skipIfValue": ["unknown"]}
Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools