Finding aid aggregation

From LSTA Wiki
Revision as of 18:08, 29 August 2008 by Smcintyre (talk | contribs) (updating procedure)
Jump to navigation Jump to search

Overview of Aggregation Process

EAD Central Index Ingestion

EAD files for all partners will be hosted as part of the Mountain West Digital Library (MWDL) system. The workflow for this process is illustrated in the diagram to the left and discussed below. All LSTA partners will follow the workflow on the right of this diagram, building EAD collections that reside on CONTENTdm servers. NEH-grant-funded partners in the Western Waters Digital Library will follow the workflow on the left, for EAD collections that reside on other Open Archives Initiative (OAI)-compatible servers. Each partner institution controls its own EAD collection, using tools and scripts developed by the technology team at the University of Utah.

The EAD collections for the LSTA-funded project are available locally by institution, as well as searchable and browsable centrally in the MWDL statewide index. Each EAD file exists in only one place, namely, on the local repository. When a user searches or browses the central index, and clicks on a item of interest in the search/browse results, he or she is taken to the EAD file on the local repository. Users can search locally as well, but they get results only from the local institution when conducting a local search. A link allows them to go to the central MWDL index at any time.

For more information about the process described below, please contact Sandra McIntyre (mailto:sandra.mcintyre@utah.edu), Nathan Pugh (mailto:nathan.pugh@utah.edu), or Debbie Rakhsha (mailto:debbie.rakhsha@utah.edu) at the University of Utah Marriott Library.

Uploading Your EAD Files to Your Institution's Repository

EAD files are uploaded to a CONTENTdm digital assets management system server that hosts your collections as part of the Mountain West Digital Library network. This involves two steps: (1) extracting the values in certain EAD elements and mapping them to CONTENTdm fields; and (2) using CONTENTdm's Acquisition Station to upload the EAD file and metadata to the Mountain West Digital Library hub server. Both steps can be done on multiple EAD files at a time for batch processing.

1. Extracting the EAD Elements

An extraction script has been created to automate the first step. The 35 EAD elements chosen for extraction, the local CONTENTdm fields they correspond to, and the Dublin Core fields they correspond to are given in the EAD-CONTENTdm-Dublin Core Elements Assignments (Mapping Table).

The extraction script has been written in VBScript by Nathan Pugh at the University of Utah, based on a VBScript created by Terry Reese at Oregon State University. Nate has tailored this script for the LSTA project partners.

Summary

When the script is double-clicked, it acts on all files in the same folder as itself that have the extension ".xml". It automatically goes through each file and queries the values in certain EAD elements and saves them as CONTENTdm fields within a tab-delimited text file. This tab-delimited file can then be used to upload the metadata to the CONTENTdm collection of your EAD files in Step 2 below.

Your IT staff can change the queries in the extraction script as needed to reflect the details of your encoding and/or the structure of your CONTENTdm fields.

Directions

  • Needed:
    • Extraction script downloaded to a local drive
    • Validated EAD files ready for uploading
  1. If you have not already done so, download the file ead-to-cdm-extraction.vbs.txt, to a local drive. IMPORTANT: Change the name of the file to "ead-to-cdm-extraction.vbs" -- i.e., delete the ".txt" at the end of the filename. (The ".txt" extension was added to allow the transfer of the file over the Internet. Most systems will not allow an executable file to be transferred.)
  2. Move or copy the extraction script file into the same directory as the EAD files you want to process. All files that you want to process must be in one folder. Remove from this folder any files that you don't want to process at this time. Please note that the EAD files must already be validated and ready to upload.
  3. Double-click the extraction script file, ead-to-cdm-extraction.vbs, and wait for it to process all the EAD files in the folder. Depending on the number and size of your EAD files, this may take anywhere from a fraction of a second to 30 or more seconds. An alert will appear when the processing is done, giving the number of files processed successfully. Click the "OK" button in the alert window. Note that a new file, called ead-to-cdm-extraction.txt, has been created in the same folder (or, if you have already run the script before, the existing file has been modified). This is the tab-delimited text file that you will use in Step 2 below.
  1. Exceptions:
    • What if the extraction script produces an error?
    • What if we have encoded our EAD elements slightly differently than the extraction script queries?
      • If, for example, you have encoded the browse subject terms source as "umabroad" instead of "UMAbroad", you may want to change this query in the extraction script file. There are instructions in the extraction script file for making modifications. Please consider any changes carefully, however. The extraction script has been set up for the Best Practices of the project, and, if it is changed in certain ways, the resulting records may not extract and upload the same way as other partners' records and therefore may not show up in searches in the MWDL central index.

2. Uploading EAD Files to CONTENTdm Server

CONTENTdm's Acquisition Station software is used to upload the EAD file and metadata to the Mountain West Digital Library hub server. This follows standard CONTENTdm procedures for importing and uploading multiple files. You can upload all the EAD files in a single directory at one time.

Display of Individual EAD Files in CONTENTdm

The display of EAD files will be within CONTENTdm's item viewer. As with all CONTENTdm collections, the CONTENTdm item viewer displays a header and footer of the partner's choice, typically with the partner's logo and other branding related to the EAD collection.

Nathan Pugh has modified the CONTENTdm item viewer to bypass the usual display of metadata and instead to go directly to the display of the EAD file itself. The display is done using an XSL transform (XSLT), which uses an XSL stylesheet (template) to transform the XML in the EAD file into XHTML for viewing in a browser. For the purposes of the initial demonstrations, we used one of the stylesheet combinations given by the EAD 2002 Cookbook site. For production, Nate has created a new stylesheet for the specific needs of the partners in this project. The stylesheet transforms the elements recommended by the Stylesheet Subcommittee convened by Dan Davis. A separate default stylesheet will be released to transform the container lists. Partners may wish to modify this default styling of the container list to reflect their own organization of the collection.

Searching and Browsing

Institutional Search and Browse: Each CONTENTdm-based partner will be able to use standard CONTENTdm features to search and browse its own EAD files. In addition, partners may create special search and browse pages using CONTENTdm's Custom Query functions.

Central Search and Browse: The metadata from all uploaded EAD files will be harvested periodically and aggregated into the Mountain West Digital Library at http://mwdl.org. Search and browse pages within MWDL's interface will allow users to discover finding aids from all partners, or from any selected subset of the partners. Sandra McIntyre and Nathan Pugh will be creating interface mockups for both searching and browsing for consideration by the LSTA partners.