Difference between revisions of "BagIt"

From CollectiveAccess Documentation
Jump to: navigation, search
(Configuration)
(Configuration)
Line 48: Line 48:
 
Next we set the CollectiveAccess [[Basic_Tables | table]] that is targeted by the export, in this case ca_collections. If desired we can also restrict the BagIt option so that it only appears for certain types within the ca_collections table using the ''restrictToTypes'' setting. This can also be left blank as shown above.
 
Next we set the CollectiveAccess [[Basic_Tables | table]] that is targeted by the export, in this case ca_collections. If desired we can also restrict the BagIt option so that it only appears for certain types within the ca_collections table using the ''restrictToTypes'' setting. This can also be left blank as shown above.
  
In ''destination'' we set where the BagIt will export. ''Types'' that will be supported include sftp, path, dropbox, amazons3. In our example we've configured an sftp connection and have included the ''hostname'', ''user'', ''password'' and ''path'' for that connection.
+
In ''destination'' we set where the "bag" will export to. ''Types'' that will be supported include sftp, path, dropbox, amazons3. In our example we've configured an sftp connection and have included the ''hostname'', ''user'', ''password'' and ''path'' for that connection.
 +
 
 +
<pre>
 +
triggers = { save, periodic = 1d },
 +
</pre>
 +
 
 +
Once the determine and basic target settings have been configured we must determine when the "bags" will generate. Above we've set the export to occur when the record has been saved, with a daily export frequency.

Revision as of 18:59, 6 November 2018

See also: BagIt support notes

About BagIt

BagIt is a standard for storage and transfer of arbitrarily structured digital content. As defined by the BagIt standard, a "bag" consists of a "payload" of one or more content files and "tags" – metadata files – documenting the bag. Every “bag” comes with a data directory identifying the “payload” and a tag file that provides a manifest of all files in the “payload” and checksums for each.

Its support for flexible payloads and the required inclusion of arbitrary metadata and checksums for data verification make BagIt well-suited for use in archival and digital preservation contexts.

BagIt support in CollectiveAccess

CollectiveAccess supports generation of BagIt files for any record, set of records or record hierarchy. Representation media may be included and optionally filtered on representation type, relationship type (where available), primary/non-primary status and/or version. Media attached to records using "media" metadata elements may be included and optionally filtered by metadata element, media version and other metadata values when the media element is part of a container. Files attached using "file" metadata elements may be included and optionally filtered by metadata element and other metadata values when the file element is part of a container

Metadata from included records may be exported using any available export mapping.

When exporting a hierarchy of records, files included in the BagIt “payload” may be structured in a directory structure mirroring the record hierarchy.

All CollectiveAccess BagIt output will be serialized as either ZIP or Gzip'ed TAR files.

BagIt output may be created automatically on creation or change to a CollectiveAccess record, or manually during an export of selected records.

CollectiveAccess will support transmission of BagIt output to remote targets. Target types will include:

  • As a direct download from within CollectiveAccess to a user's local machine.
  • Locally mounted file systems (e.g. a local directory on the server, or a file server mounted on the server)
  • A remote file store such as Dropbox, Amazon S3, SFTP, Lockss or GoogleAPI.

Configuration

The BagIt workflow in CollectiveAccess is configured in the file external_exports.conf located in /app/conf. This file contains settings for BagIt targets, outputs, options and more.

Here we'll walk through each part of the file and the parameters for each setting. To start we must configure a target. Multiple targets may be configured within a single CollectiveAccess system. Let's begin by setting up a custom BagIt export that packages an EAD XML finding aid along with it's related media assets. First, under target, we set the preliminary details:

targets = {
    ead_collections = {  
    
        label = EAD BagIt export to server,
        table = ca_collections,
        restrictToTypes = [],
        destination = {
            type = sftp,        
            hostname = 192.168.6.4,
            user = seth,
            password = a_password_goes_here,
            path = /data/exports   
        },

Within targets we've created a rule set called ead_collections. This is an arbitrary name for the configuration (that should contain no spaces or special characters). The name for the export that catalogers will see in CollectiveAccess is set via label (here we've called it "EAD BagIt export to server").

Next we set the CollectiveAccess table that is targeted by the export, in this case ca_collections. If desired we can also restrict the BagIt option so that it only appears for certain types within the ca_collections table using the restrictToTypes setting. This can also be left blank as shown above.

In destination we set where the "bag" will export to. Types that will be supported include sftp, path, dropbox, amazons3. In our example we've configured an sftp connection and have included the hostname, user, password and path for that connection.

triggers = { save, periodic = 1d },

Once the determine and basic target settings have been configured we must determine when the "bags" will generate. Above we've set the export to occur when the record has been saved, with a daily export frequency.

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools