Finding aid aggregation

From LSTA Wiki
Revision as of 16:40, 29 August 2008 by Smcintyre (talk | contribs) (updating procedure)
Jump to navigation Jump to search

Overview of Aggregation Process

EAD Central Index Ingestion

EAD files for all partners will be hosted as part of the Mountain West Digital Library (MWDL) system. The workflow for this process is illustrated in the diagram to the left and discussed below. All LSTA partners will follow the workflow on the right of this diagram, building EAD collections that reside on CONTENTdm servers. NEH-grant-funded partners in the Western Waters Digital Library will follow the workflow on the left, for EAD collections that reside on other Open Archives Initiative (OAI)-compatible servers. Each partner institution controls its own EAD collection, using tools and scripts developed by the technology team at the University of Utah.

The EAD collections for the LSTA-funded project are available locally by institution, as well as searchable and browsable centrally in the MWDL statewide index. Each EAD file exists in only one place, namely, on the local repository. When a user searches or browses the central index, and clicks on a item of interest in the search/browse results, he or she is taken to the EAD file on the local repository. Users can search locally as well, but they get results only from the local institution when conducting a local search. A link allows them to go to the central MWDL index at any time.

For more information about the process described below, please contact Sandra McIntyre (mailto:sandra.mcintyre@utah.edu), Nathan Pugh (mailto:nathan.pugh@utah.edu), or Debbie Rakhsha (mailto:debbie.rakhsha@utah.edu) at the University of Utah Marriott Library.

Uploading Your EAD Files to Your Institution's Repository

EAD files are to be uploaded to a CONTENTdm digital assets management system server that hosts your collections as part of the Mountain West Digital Library network. This involves two steps: (1) extracting the values in certain EAD elements and mapping them to CONTENTdm fields; and (2) using CONTENTdm's Acquisition Station to upload the EAD file and metadata to the Mountain West Digital Library hub server. Both steps can be done on multiple EAD files at a time for batch processing.

1. Extracting the EAD Elements

An extraction script has been created to automate the first step. The 35 EAD elements chosen for extraction, the local CONTENTdm fields they correspond to, and the Dublin Core fields they correspond to are given in the EAD-CONTENTdm-Dublin Core Elements Assignments (Mapping Table).

The extraction script was written in VBScript by Nathan Pugh at the University of Utah, based on a VBScript created by Terry Reese at Oregon State University. When the script is double-clicked, it acts on all files in the same folder as itself that have the extension ".xml". It automatically goes through each file and queries the values in certain EAD elements and saves them as CONTENTdm fields within a tab-delimited text file. This tab-delimited file can then be used to upload the metadata to the CONTENTdm collection of your EAD files.

Your IT staff can change the queries in the extraction script as needed to reflect the details of your encoding and/or the structure of your CONTENTdm fields.

2. Uploading EAD Files to CONTENTdm Server

CONTENTdm's Acquisition Station software is used to upload the EAD file and metadata to the Mountain West Digital Library hub server. This follows standard CONTENTdm procedures for importing and uploading multiple files. You can upload all the EAD files in a single directory at one time.

Display of Individual EAD Files in CONTENTdm

The display of EAD files will be within CONTENTdm's item viewer. As with all CONTENTdm collections, the CONTENTdm item viewer displays a header and footer of the partner's choice, typically with the partner's logo and other branding related to the EAD collection.

Nathan Pugh has modified the CONTENTdm item viewer to bypass the usual display of metadata and instead to go directly to the display of the EAD file itself. The display is done using an XSL transform (XSLT), which uses an XSL stylesheet (template) to transform the XML in the EAD file into XHTML for viewing in a browser. For the purposes of the initial demonstrations, we used one of the stylesheet combinations given by the EAD 2002 Cookbook site. For production, Nate has created a new stylesheet for the specific needs of the partners in this project. The stylesheet transforms the elements recommended by the Stylesheet Subcommittee convened by Dan Davis. A separate default stylesheet will be released to transform the container lists. Partners may wish to modify this default styling of the container list to reflect their own organization of the collection.

Searching and Browsing

Institutional Search and Browse: Each CONTENTdm-based partner will be able to use standard CONTENTdm features to search and browse its own EAD files. In addition, partners may create special search and browse pages using CONTENTdm's Custom Query functions.

Central Search and Browse: The metadata from all uploaded EAD files will be harvested periodically and aggregated into the Mountain West Digital Library at http://mwdl.org. Search and browse pages within MWDL's interface will allow users to discover finding aids from all partners, or from any selected subset of the partners. Sandra McIntyre and Nathan Pugh will be creating interface mockups for both searching and browsing for consideration by the LSTA partners.