BagIt support notes

From CollectiveAccess Documentation
Revision as of 17:38, 25 June 2018 by Julia (talk | contribs) (About BagIt)
Jump to: navigation, search

BagIt support requirements and implementation notes

About BagIt

BagIt is a standard for storage and transfer of arbitrarily structured digital content. As defined by the BagIt standard, a "bag" consists of a "payload" of one or more content files and "tags" – metadata files – documenting the bag. A required tag includes a manifest of all files and checksums for each.

BagIt's support for flexible payloads, inclusion of arbitrary metadata and checksums for data verification make it well-suited for use in archival and digital preservation contexts.

BagIt support in CollectiveAccess

CollectiveAccess will support generation of BagIt files for any record, set of records or record hierarchy. Representation media may be included and optionally filtered on representation type, relationship type (where available), primary/non-primary status and/or version. Media attached to records using "media" metadata elements may be included and optionally filtered by metadata element, media version and other metadata values when the media element is part of a container. Files attached using "file" metadata elements may be included and optionally filtered by metadata element and other metadata values when the file element is part of a container

Metadata from included records may be exported using any available export mapping.

When exporting a hierarchy of records, included files in BagIt may be structured in a directory structure mirroring the record hierarchy.

All CollectiveAccess BagIt output will be serialized as either ZIP or Gzip'ed TAR files.

BagIt output may be created automatically on creation or change to a CollectiveAccess record, or manually during an export of selected records.

CollectiveAccess will support transmission of BagIt output to remote targets. Target types will include:

  • Locally mounted file systems (Eg. a local directory on the server, a file server mounted on the server)
  • A remote file store such as Dropbox, Amazon S3, SFTP, Lockss or GoogleAPI.
  • Maybe also Fedora in a BagIt-like structure?
  • As a direct download from within CollectiveAccess


Example use cases:

  • Automatically deposit a bag for each object record on creation. Bag contains all representations attached to the object as well as exported metadata for the object and each representation. Bags are updated whenever the object record or representations are modified. Bags are written out to a remote "preservation" server via SFTP
  • Search for objects, create a set and export the set (via SFTP to another server) as a bag with metadata for each object + all attached representations as medium-sized JPEGs (not original high-resolution files).
  • Export a bag for an entity record, including metadata and attached files and download for sharing via email.
  • Generate a bag from a hierarchy of object records that includes media and metadata from each object in a directory structure that mimics the original hierarchy and transmit it to a DropBox account for storage and dissemination.

Personal tools