Difference between revisions of "Media import"

From CollectiveAccess Documentation
Jump to: navigation, search
(Skip Files)
Line 105: Line 105:
==Skip Files ==
==Skip Files ==
With this setting, you may use [[http://www.pcre.org/pcre.txt|Perl-compatible regular expressions]] to filter out files in the media directory that you wish to skip. You may also simply list out the filenames of those you wish to skip, one per line.
With this setting, you may use [[Perl-compatible regular expressions http://www.pcre.org/pcre.txt]] to filter out files in the media directory that you wish to skip. You may also simply list out the filenames of those you wish to skip, one per line.
== Miscellaneous ==  
== Miscellaneous ==  

Revision as of 17:03, 23 September 2015

Version 1.5 and above

Import target

When importing media, you are creating a type of record called a "media representation." However, these records are almost always associated with an Object record. In other cases, representations are associated with other primary tables, like entities or occurrences. Use "Import Target" to set which type of record the media representations should be associated with.

Directory to import

Check the inspector to find out what server directory is associated with the import. By default it is located in your installations' /import folder. You can change the directory in app.conf

You'll see a hierarchy browser reflecting the import directory like this.

Import directory.png

There are two directory settings

Setting Description
Include all sub-directories Using the above examples, if I were to import "foo" and check this setting, the sub-directories "ack", "bar", and "meow" would also be imported. Otherwise, only files directly stored in "foo" would be imported.
Delete media after import If this setting is checked, media will be deleted from the import directory after it is uploaded to CollectiveAccess.

Import Mode

Setting Description
Import all media, matching with existing records where possible This setting will match media with existing records in cases where the media filename matches the record's idno. In other all other cases, new records will be generated for each media file.
Import only media that can be matched with existing records This setting will import only the media that can be matched with an existing record, by finding matches between filename and idno. All other media files will be skipped.
Import all media, creating new records for each This setting will import all media in the chosen directory filepath. Brand new records will be created for each media file.


Setting Description
Type used for newly created object This menu will be populated with the import target's type list. If the target is objects, you can select the specific object type here. The same is true for all other primary tables (entities, occurrences, etc.) Type used for newly created object representations Use this to set the media representation type. Default values are "front" and "back."


This menu allows you to associate imported records with set.

Setting Description
Add imported media to set This setting is followed by a drop-down menu populated by the names of existing sets. Use this setting to add all import media to an existing Set. Create set This setting allows you to create a new set and add all import media records. The setting includes a field to input the title of the new set.
Do not associate imported media with a set Use this setting if none of the imported media should be associated with a Set.

Object identifier

Setting Description
Set object identifier to If creating new records, use this setting to manually set the target records' identifier.
Set object identifier to file name This setting will take the media filename as the record identifier. For example, my_image.jpeg will be used as the Identifier for the object record associated with this image.
Set object identifier to file name without extension This will create (or match) on filename, but without the extension. In other words, my_image.jpeg simply becomes "my_image". This setting is particularly useful when matching on existing records that may have identifiers that match filenames, but don't include file extensions.
Set object identifier to directory and file name If this setting is used, idnos will be set using not only the filename, but also the import directory path. For example, if "my_image.jpeg" is stored in folder called "Media", the record's idno becomes /Media/my_image.jpeg

Status & access

This menu simply allows you to set the "status" and "access" fields for both the "import target" record as well as the representation record.

Setting Description
Set object status to Set new, completed, editing in progress, etc. to the import target status. (If Objects, object status. If Entities, entity status).
Set object access to Set import target access value to "accessible to public" or "not accessible to public."
Set representation status to Set new, completed, editing in progress, etc. to the media representations' status.
Set representation access to Set media representation access value to "accessible to public" or "not accessible to public."

Advanced Options

The media importer includes several advanced options as well.


By default, matching occurs on filename. This setting allows you to set matching on directory name, or directory name, then filename. Additionally, you can limit the matching by type.

Object representation identifier

This setting is similar to the import target identifier setting, only it applies specifically to the media representation record, rather than the import target record.


Some projects have a very structured way of assigning filenames to media. For projects with Entity authorities, it's not uncommon for a media filename to include the entity authorities identifier in the file, if that media happens to depict the entity. For example, let's say I have a photograph of Martha Graham and in my system, her entity idno is 12345. I might name that image 12345.jpeg, to indicate that the image depicts Martha Graham. If this is the case, I can use the "relationships" setting to ensure that the object record associate with the imported image is in fact also related to entity record for Martha Graham (12345). Using this setting, not only can you select the related tables, but also the relationship type. In this case, I might choose "depicted."

Skip Files

With this setting, you may use Perl-compatible regular expressions http://www.pcre.org/pcre.txt to filter out files in the media directory that you wish to skip. You may also simply list out the filenames of those you wish to skip, one per line.


Setting Description
Allow duplicate media? By default, duplicate media files will be skipped. Use this setting if you wish to override this.
Log level This setting allows you to control the level of detail in the log. The log can capture errors, warnings, alerts, informational messages, and debugging messages. Use the latter for the most comprehensive log.


Media, which is often an important component of a record, can be tedious to upload file by file. Fortunately, CollectiveAccess offers a Batch Import tool to simplify the upload of large sets of files, whether you wish to match them to existing records or create new records based on the media. In this section, we will look at each element on the Media Batch-Import screen. To use this tool, click on "Import" in the Global Navigation Bar (see below):

Global nav media.png

Before you begin importing media, you must carefully define your settings. Begin by choosing a directory to import from the top field on the screen. Click on the grey arrow to the right of each directory name to display a list of the folder's contents. If the folder includes subdirectories containing media you wish to import, be sure to check "Include all sub-directories" at the bottom of the field. This option will remain checked the next time you perform an import unless you choose to uncheck it. Directory to import.png

Next, you must choose an Import Mode. In the second field on the screen, you will see a drop-down list with three options: "Import all media, matching with existing records where possible," "Import only media that can be matched with existing records," or "Import all media, creating new records for each." Let's take a look at each of these options.

1.) Import all media, matching with existing records where possible: Choose this option if you know that you already have some records in your system that correspond with the filenames of media in your batch. CollectiveAccess will search for these matches and thereby avoid information duplication. Any files that do not match an existing record will be turned into new records, and you will be responsible for fleshing out their contents.

2.) Import only media that can be matched with existing records: Choose this option if you do not wish to create any records beyond those already in place. CollectiveAccess will import files that match existing object records, and no others.

3.) Import all media, creating new records for each: Choose this option if you have no pre-existing records in the database, or if you know that none of the pre-existing records match the media you're importing. You will then have barebones records for each file in your import, and you will need to complete those records by hand. Import mode.png

If you choose to import media that will be matched with existing records where possible, you will need to specify the matching criteria. What aspects of your file naming system match-up with your record identifiers? In some cases, the file names themselves may include the relevant idno to match media with its appropriate record. For example, you may have a record with the idno "M100" and a file with the name "M100.jpg" Under these circumstances, you would choose the option Match Using File Name. In other cases, you might have a directory title that contains the idno or preferred label for the relevant record, but the individual files within the directory are simply titled img 1, img 2, img 3, etc. This is the setting you would use to match multiple media representations to a single record. In a case such as this, you would choose the option Match Using Directory Name. There might also be instances in which you have a combination of these circumstances, in which case you could choose Match Using Directory Name, Then File Name.

Matching dropdown.png

The next field in the batch-import interface asks you to specify Type. Every record in CollectiveAccess has a Primary Type, and certain types of records are further divided into subtypes. A batched media import will apply to Object records, so you must choose the subtype that applies to your media. For example, are they all photographs? Documents? Postcards? Please note that in order to perform a batch-import, you must be sure that all of the media you are importing are of the same type. In other words, group the files together in a systematic way before beginning to upload them. The Type field on the Batch-Import screen has two drop-down lists:

1.) Type used for newly created objects: As discussed above, you must choose the type of object record to be used for your media. Some systems will have more nuanced choices than others, but make sure all of the media you're importing in one batch is of the same type.

2.) Type used for newly created object representations: This drop-down list has three options: Front, Back, or Default. Choose front or back when you are importing a batch of media that could have multiple representations. A batch of images of postcards, for example, may have views of either the front or back of the object in question. Otherwise, choose default. Type field.png

The next field on the screen allows you to add your media to a set or create a new set concurrently with the import. This option saves time, and, as discussed in Batch Editing, allows you to quickly make necessary changes to all of the records associated with your import in one fell swoop. Grouping your imported files into a set can also be helpful in that it simplifies the process of double-checking your work. However, if you do not wish to associate your imported media with a set, you can simply select "Do not associate imported media with a set." Set field.png

If you are importing media that doesn't already have a matching record and you are seeking to create new records for the uploaded files, you will need to set object identifiers for your imported media. Depending on your collection's content standard, you may want to either use the filename (or directory name, depending on which is more descriptive) as an object identifier, or you may choose to create a new identifier that corresponds with the standard you've established. Batch media identifier.png

You can also choose access and status settings before beginning your import. For example, you can set your object status to "Complete," "In editing," "Needs review," or "New" to remind yourself (or other catalogers) to revisit the associated records if necessary. If your archive has a public component, you can choose whether or not the newly imported media will be accessible to the public or not. This feature allows you to import media more cautiously, particularly if you are working with a front-end component.

Status and access field.png

The next Media Import option is slightly more complex, and is necessary in some, but not all cases. This feature gives you the opportunity to connect your newly imported batch of media to an existing record through a relationship rather than as strict representations of that record. For example, a theatre archive may want to connect images from a given production to the work record for that production. At the time time, each image should have its own object record. With this tool, If you have an entire directory of images from one production, and the directory name matches the production record identifier, then an import of that directory will create new records for each file in the directory, each of which is already linked via relationship to the appropriate production.

Import relationships.png

As you can see, in addition to occurrence relationships, you can create relationships (based on matching a file name/directory to a record identifier) with entities, collections, and places.

Finally, you may be working with a directory that includes files you do not wish to import. As a protection against this, the Media Batch Importer includes a "Skip Files" tool that can be employed in several ways. If you know the exact names of the files, you can simply type them in here (one per line). If you know part of the name, or if there are several with a similar name, you can use an asterisk (*) as a wildcard. If this is not sufficient, you can also employ regular expressions by enclosing values in forward slashes (/).

When you have set each field to your satisfaction, it's time to execute the batch import. As with any operation that involves a large set of data, exercise caution - you don't want to pepper your database with bad data or faulty records. Click the "Execute media import" button at the bottom (or top) of the screen. Before the operation is completed, you will be asked to confirm your choice, and you will be able to decide how you'd like to receive information about the import's progress. You can choose to have the import process in the background as you work, you can opt to receive an email when the import is complete, or you can request to have a text message sent to you once the process is completed. This last option, of course, is only available if you've included your SMS number in your user profile (see Access_Roles). All of these options are available because a large directory can take a significant amount of time to import. Allowing the process to occur in the background or choosing to receive an alert once it is completed gives you the freedom to pursue other tasks while it's happening.

Import media confirm.png

CollectiveAccess 1.4 also includes an option to upload a data mapping and import a batch of data through the user interface. This is the other option in the "Import" drop-down on your global navigation bar. For more information on using this feature, please see Data_Import.


Personal tools