Finding aid aggregation
Overview of Aggregation Process
EAD files for all partners will be hosted as part of the Mountain West Digital Library (MWDL) system. The workflow for this process is illustrated in the diagram to the left and discussed below. All LSTA partners will follow the workflow on the right of this diagram, building EAD collections that reside on CONTENTdm servers. NEH-grant-funded partners in the Western Waters Digital Library will follow the workflow on the left, for EAD collections that reside on other Open Archives Initiative (OAI)-compatible servers. Each partner institution controls its own EAD collection, using tools and scripts developed by the technology team at the University of Utah.
The EAD collections for the LSTA-funded project are available locally by institution, as well as searchable and browsable centrally in the MWDL statewide index. Each EAD file exists in only one place, namely, on the local repository. When a user searches or browses the central index, and clicks on a item of interest in the search/browse results, he or she is taken to the EAD file on the local repository. Users can search locally as well, but they get results only from the local institution when conducting a local search. A link allows them to go to the central MWDL index at any time.
For more information about the process described below, please contact Sandra McIntyre (mailto:sandra.mcintyre@utah.edu), Nathan Pugh (mailto:nathan.pugh@utah.edu), or Debbie Rakhsha (mailto:debbie.rakhsha@utah.edu) at the University of Utah Marriott Library.
Uploading Your EAD Files to Your Institution's Repository
EAD files are uploaded to a CONTENTdm digital assets management system server that hosts your collections as part of the Mountain West Digital Library network. This involves two steps:
- extracting the values in certain EAD elements and mapping them to CONTENTdm fields
- using CONTENTdm's Acquisition Station to upload the EAD file and metadata to the Mountain West Digital Library hub server.
Both steps can be done on multiple EAD files at a time for batch processing.
1. Extracting the EAD Elements
An extraction script has been created to automate the first step. The 35 EAD elements chosen for extraction, the local CONTENTdm fields they correspond to, and the Dublin Core fields they correspond to are given in the EAD-CONTENTdm-Dublin Core Elements Assignments (Mapping Table). The extraction script has been written in VBScript by Nathan Pugh at the University of Utah, based on a VBScript created by Terry Reese at Oregon State University. Nate has tailored this script for the LSTA project partners.
Summary of the Process
Use the extraction script when you have a batch of EAD files that are ready for uploading to the server. When the script is double-clicked, it acts on all files in the same folder as itself that have the extension ".xml". It automatically goes through each file and queries the values in certain EAD elements and saves them as CONTENTdm fields within a tab-delimited text file. This tab-delimited file can then be used to upload the extracted metadata to the CONTENTdm collection of your EAD files in Step 2 below.
Directions
- If you have not already done so, download the file ead-to-cdm-extraction.vbs.txt, to a local drive. IMPORTANT: Change the name of the file to "ead-to-cdm-extraction.vbs" -- i.e., delete the ".txt" at the end of the filename. (The ".txt" extension was added to allow the transfer of the file over the Internet. Most systems will not allow an executable file to be transferred.)
- Move or copy the extraction script file into the same directory as the EAD files you want to process. All files that you want to process must be in one folder. Remove from this folder any files that you don't want to process at this time. Please note that the EAD files must already be validated and ready to upload.
- Double-click the extraction script file, ead-to-cdm-extraction.vbs, and wait for it to process all the EAD files in the folder. Depending on the number and size of your EAD files, this may take anywhere from a fraction of a second to 30 or more seconds. An alert will appear when the processing is done, giving the number of files processed successfully. Click the "OK" button in the alert window. Note that a new file, called ead-to-cdm-extraction.txt, has been created in the same folder (or, if you have already run the script before, the existing file has been modified). This is the tab-delimited text file that you will use in Step 2 below.
Exceptions
- What if the extraction script produces an error?
- At least one of your EAD files is not coded the right way. Please review the Best Practices Guidelines, particularly the guidelines regarding the elements to be extracted, given in EAD-CONTENTdm-Dublin Core Elements Assignments (Mapping Table). Then revise your EAD files and try again.
- What if we have encoded our EAD elements slightly differently than the extraction script queries?
- Your IT staff can change the queries in the extraction script as needed to reflect the details of your encoding and/or the structure of your CONTENTdm fields. If, for example, you have encoded the browse subject terms source as "umabroad" instead of "UMAbroad", you may want to change this query in the extraction script file. Or, if you want to rename the "Repository" field to "Holding Institution", you can change the field name accordingly. There are instructions in the extraction script file for making modifications. Warning: When making changes, please continue to conform to the UMA project Best Practices Guidelines. If the extraction script is changed to pull elements in non-standard ways, the resulting records may not show up in searches in the MWDL central index.
2. Uploading EAD Files to CONTENTdm Server
CONTENTdm's Acquisition Station software is used to upload the EAD file and metadata to the Mountain West Digital Library hub server. This follows standard CONTENTdm procedures for importing and uploading multiple files. You can upload all the EAD files in a single directory at one time.
Summary
Summary goes here.
Directions
- The first step
- The second step
- The third step
Exceptions
- First question
- Answer
- Second question
- Answer
Display of Individual EAD Files in CONTENTdm
Summary
The display of EAD files is within CONTENTdm's item viewer. As with all CONTENTdm collections, the CONTENTdm item viewer displays a header and footer of the partner's choice, typically with the partner's logo and other branding related to the EAD collection. See a sample EAD file in the University of Utah's EAD collection.
Nathan Pugh has modified the CONTENTdm item viewer to bypass the usual display of metadata and instead to go directly to the display of the EAD file itself. The display is done using an XSL transform (XSLT), which uses an XSL stylesheet (template) to transform the XML in the EAD file into XHTML for viewing in a browser. Nate has created a stylesheet for the specific needs of the partners in this project. The stylesheet transforms the elements recommended by the Stylesheet Subcommittee convened by Dan Davis. A separate default stylesheet is being released to transform the container lists. Although the default container list stylesheet will transform most container lists, some partners may wish to modify this default styling to reflect their own organization of the collection.
Directions
- Browse your EAD collection in CONTENTdm by going into your Digital Collections page and selecting the EAD collection. You will see a results page listed the first 20 or so of your EAD files.
- Click any file in the results list to view it.
Exceptions
- My EAD file shows elements that ...
- Please check your encoding against the Best Practices Guidelines. To change an already-uploaded EAD file, see these directions.
- I don't like the formatting of my container list.
- Your IT staff can change the default container list stylesheet, container stylesheet. The default stylesheet is designed to group hierarchically embedded container elements, <c0x>. Various partners on this project, as well as on the NWDA project, have created a variety of stylings for container lists.
Searching and Browsing
Institutional Search and Browse: Each CONTENTdm-based partner will be able to use standard CONTENTdm features to search and browse its own EAD files. In addition, partners may create special search and browse pages using CONTENTdm's Custom Query functions.
Central Search and Browse: The metadata from all uploaded EAD files will be harvested periodically and aggregated into the Mountain West Digital Library at http://mwdl.org. Search and browse pages within MWDL's interface will allow users to discover finding aids from all partners, or from any selected subset of the partners. Sandra McIntyre and Nathan Pugh will be creating interface mockups for both searching and browsing for consideration by the LSTA partners.