BagIt support notes

From CollectiveAccess Documentation
Jump to: navigation, search

BagIt support requirements and implementation notes

About BagIt

BagIt is a standard for storage and transfer of arbitrarily structured digital content. As defined by the BagIt standard, a "bag" consists of a "payload" of one or more content files and "tags" – metadata files – documenting the bag. Every “bag” comes with a data directory identifying the “payload” and a tag file that provides a manifest of all files in the “payload” and checksums for each.

Its support for flexible payloads and the required inclusion of arbitrary metadata and checksums for data verification make BagIt well-suited for use in archival and digital preservation contexts.

BagIt support in CollectiveAccess

CollectiveAccess will support generation of BagIt files for any record, set of records or record hierarchy. Representation media may be included and optionally filtered on representation type, relationship type (where available), primary/non-primary status and/or version. Media attached to records using "media" metadata elements may be included and optionally filtered by metadata element, media version and other metadata values when the media element is part of a container. Files attached using "file" metadata elements may be included and optionally filtered by metadata element and other metadata values when the file element is part of a container

Metadata from included records may be exported using any available export mapping.

When exporting a hierarchy of records, files included in the BagIt “payload” may be structured in a directory structure mirroring the record hierarchy.

All CollectiveAccess BagIt output will be serialized as either ZIP or Gzip'ed TAR files.

BagIt output may be created automatically on creation or change to a CollectiveAccess record, or manually during an export of selected records.

CollectiveAccess will support transmission of BagIt output to remote targets. Target types will include:

  • Locally mounted file systems (e.g., a local directory on the server, or a file server mounted on the server)
  • A remote file store such as Dropbox, Amazon S3, SFTP, Lockss or GoogleAPI.
  • Maybe also Fedora in a BagIt-like structure?
  • As a direct download from within CollectiveAccess to a user's local machine.


Example use cases:

  1. When creating an object record in CollectiveAccess, automatically deposit a bag for each record created. Bag contains all representations attached to the object as well as exported metadata for the object and each representation. Bags are written out to a remote "preservation" server via SFTP. Bags are automatically updated whenever the object record or representations are modified.
  2. Search for objects in a CollectiveAccess system, create a set and export the set as a bag with metadata for each object. All attached representations are also included in the bag as medium-sized JPEGs (not original high-resolution files). Bag is exported via SFTP to another server.
  3. Export a bag for a CollectiveAccess entity record, including metadata and attached files and download for sharing via email.
  4. Generate a bag from a hierarchy of CollectiveAccess object records. Bag contains media and metadata from each object in a directory structure that mimics the original hierarchy. Transmit this bag to a DropBox account for storage and dissemination.

Personal tools