Metadata and Open Access Repositories
Michael S. Babinec, MLIS
Bibliographic Services, Northwestern University, Evanston, IL, USA
Holly Mercer, MLIS
Scholar Services, University of Kansas Libraries, Lawrence, KS, USA
, Michael S. Babinec and Holly Mercer, Guest Editors
Metadata Quality in Digital Repositories: A Survey of the Current State of the Art
ABSTRACT: This study presents the current state of research and practice on metadata quality through focus on the functional perspective on metadata quality, measurement and evaluation criteria coupled with mechanisms for improving metadata quality. Quality metadata reflect the degree to which the metadata in question perform the core bibliographic functions of discovery, use, provenance, currency, authentication and administration. The functional perspective is closely tied to the criteria and measurements used for assessing metadata quality. Accuracy, completeness and consistency are the most common criteria used in measuring metadata quality in literature. Guidelines embedded within a web form or template perform a valuable function in improving the quality of the metadata. Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.
KEYWORDS: Metadata, quality control, metadata quality evaluation, completeness, accuracy, consistency, metadata guidelines, (semi) automatic metadata generation, digital repositories
Experiences in Deploying Metadata Analysis Tools for Institutional Repositories
David M. Nichols, Gordon W. Paynter, Chu-Hsiang Chan, David Bainbridge, Dana McKay, Michael B. Twidale, Ann Blandford
ABSTRACT: Current institutional repository software provides few tools to help metadata librarians understand and analyse their collections. In this paper, we compare and contrast metadata analysis tools that were developed simultaneously, but independently, at two New Zealand institutions during a period of national investment in research repositories: the Metadata Analysis Tool (MAT) at The University of Waikato, and the Kiwi Research Information Service (KRIS) at the National Library of New Zealand.
The tools have many similarities: they are convenient, online, on-demand services that harvest metadata using OAI-PMH, they were developed in response to feedback from repository administrators, and they both help pinpoint specific metadata errors as well as generating summary statistics. They also have significant differences: one is a dedicated tool while the other is part of a wider access tool; one gives a holistic view of the metadata while the other looks for specific problems; one seeks patterns in the data values while the other checks that those values conform to metadata standards.
Both tools work in a complementary manner to existing web-based administration tools. We have observed that discovery and correction of metadata errors can be quickly achieved by switching web browser views from the analysis tool to the repository interface, and back. We summarise the findings from both tools' deployment into a checklist of requirements for metadata analysis tools.
KEYWORDS: metadata quality, institutional repositories, evaluation
Name Authority Control in Institutional Repositories
ABSTRACT: Neither the standards nor the software underlying institutional repositories anticipated performing name authority control on widely disparate metadata from highly unreliable sources. Without it, though, both machines and humans are stymied in their efforts to access and aggregate information by author. Many organizations are awakening to the problems and possibilities of name authority control, but without better coordination, their efforts will only confuse matters further. Local heuristics-based name-disambiguation software may help those repository managers who can implement it. For the time being, however, most repository managers can only control their own name lists as best they can after deposit while they advocate for better systems and services.
KEYWORDS: Institutional repositories, authority control
Study on the Use of Metadata for Digital Learning Objects in University Institutional Repositories (MODERI)
Gema Bueno-de-la-Fuente, Tony Hernández-Pérez, David Rodríguez-Mateos, Eva M. Méndez-Rodríguez, Bonifacio Martín-Galán
ABSTRACT: Metadata is a core issue for the creation of repositories. Different institutional repositories have chosen and use different metadata models, elements and values for describing the range of digital objects they store. Thus, this paper analyzes the current use of metadata describing those Learning Objects that some open higher educational institutions' repositories include in their collections. The goal of this work is to identify and analyze the different metadata models being used to describe educational features of those specific digital educational objects (such as audience, type of educational material, learning objectives, etc.).
Also discussed is the concept and typology of Learning Objects (LO) through their use in University Repositories. We will also examine the usefulness of specifically describing those learning objects, setting them apart from other kind of documents included in the repository, mainly scholarly publications and research results of the Higher Education institution.
KEYWORDS: Institutional Repositories, Higher Education, Learning Objects, Metadata, Open Access, Dublin Core, OAI-PMH
University Scholarly Knowledge Inventory System: A Workflow System for Institutional Repositories
Anne Morrow, Allyson Mower
ABSTRACT: The University Scholarly Knowledge Inventory System (U-SKIS) provides workspace for institutional repository staff. U-SKIS tracks files, communications, and publishers' archiving policies to determine what may be added to a repository. A team at the University of Utah developed the system as part of a strategy to gather previously published peer-reviewed articles. As campus outreach programs developed, coordinators quickly amassed thousands of journal articles requiring copyright research and permission. This paper describes the creation of U-SKIS, addresses the educational role U-SKIS plays in the scholarly communication arena and explores the implications of implementing scalable workflow systems for other digital collections.
KEYWORDS: digital repositories; workflow management; copyright permissions; digital collections; collection development
Electronic Thesis and Dissertation Metadata Workflow at Oregon State University Libraries
Michael Boock and Sue Kunda
ABSTRACT: In July 2005, the Oregon State University Libraries began accepting electronic versions of student theses and dissertations into ScholarsArchive@OSU, the library's institutional repository. By January 2007, all Oregon State University graduate students were required to deposit their final research. This paper compares past processes and workflows for print theses and dissertations with the present workflow for electronic. We provide the rationale for changes and review the cost- and time-savings produced. We describe the changing roles of students, technicians and librarians in the metadata process as well as the value of students describing their own work.
KEYWORDS: ETDs, metadata, workflow, Oregon State University
Repository Metadata: Approaches and Challenges
John W. Chapman, David Reynolds, Sarah A. Shreeves
ABSTRACT: Many institutional repositories have pursued a mixed metadata environment, relying on description by multiple workflows. Strategies may include metadata converted from other systems, metadata elicited from the document creator or manager, and metadata created by library or repository staff. Additional editing or proofing may or may not occur. The mixed environment brings challenges of creation, management, and access. In this paper, repository efforts at three major universities are discussed. All three repositories run on the DSpace software package, and the opportunities and limitations of that system will be examined. The authors discuss local strategies in light of current thinking on metadata creation, user behavior, and the aggregation of heterogeneous metadata. The contrasts between the mission of each repository effort will show the importance of local customization, while the experience of all three institutions forms the basis for recommendations on strategies of benefit to a wide range of librarians and repository planners.
KEYWORDS: Metadata, DSpace, institutional repository, heterogeneity
Describing Digital Objects: A Tale of Compromises
Jessica Branco Colati, Robin Dean, Keith Maull
ABSTRACT: The Alliance Digital Repository (ADR) is a consortial digital repository service developed by Colorado Alliance of Research Libraries (Alliance). This paper details how a standard descriptive metadata policy for repository records developed, and how that policy is currently being implemented. All digital objects in the ADR are required to have MODS and OAI-Dublin Core metadata that conform to certain minimum requirements. To help members meet the requirements, Alliance staff and the ADR Metadata Working Group, using tools available in the Fedora/Fez repository environment, have developed a customized set of core ADR material type templates in XSD form.
KEYWORDS: Digital repositories, MODS, Dublin Core, metadata, Fedora, Fez, XSD, digital objects
Research Data and Repository Metadata: Policy and Technical Issues at the University of Sydney Library
ABSTRACT: The University of Sydney Library's repository contains research outputs primarily comprising traditional publication types. Many academics manage data collections within databases and spreadsheets using metadata dissimilar to the repository's Dublin Core schema. During 2007 and 2008 the author explored issues surrounding submission of a small range of research data collections and associated metadata. Native metadata structures were analysed and mapped to DC and scripts translated, packaged and transferred collections. This paper discusses metadata management and repository service levels and sustainability. It describes the Library's approach to defining service requirements and includes discussion of various metadata management options. It also describes related activities within the University of Sydney to develop eResearch services and to harmonise the roles and relationships of eResearch support service providers.
KEYWORDS: research data collections, metadata translation, batch processing, repository service levels, repository service sustainability, repository service policy development
Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption
ABSTRACT: The Dryad repository is for data supporting published research in the field of evolutionary biology and related disciplines. Dryad development team members seek a theoretical framework to aid communication about metadata issues and plans. This article explores lifecycle modeling as a theoretical framework for understanding metadata in the repostiroy enivornment. A background discussion reviews the importance of theory, the status of a metadata theory, and lifecycle concepts. An analysis draws examples from the Dryad repository demonstrating automatic propagation, metadata inheritance, and value system adoption, and reports results from a faceted term mapping experiment that included 12 vocabularies and approximately 600 terms. The article also reports selected key findings from a recent survey on the data-sharing attitudes and behaviors of nearly 400 evolutionary biologists. Te results confirm the applicability of lifecycle modeling to Dryad's metadata infrastructure. The article concludes that lifecycle modeling provides a theoretical framework that can enhance our understanding of metadata, aid communication about the topic of metadata in the repository environment, and potentially help sustain robust repository development.
KEYWORDS: metadata, metadata theory, repositories, Dryad, lifecycle modeling, automatic metadata propagation, metadata inheritance, value system adoption
Tyler O. Walters says, "In many ways, when libraries create institutional repositories (IRs), they are reinventing themselves. Traditionally, libraries have managed information produced by organizations—namely publishers—outside of their parent institutions. They select, acquire, organize, make accessible, promote, preserve, and instruct people about how to use these information resources. However, IR developers are primarily concerned with content generated internally—that is, with the intellectual output (usually in digital form) of their university communities."1 He goes on to describe cataloging and technical services units as the "inputting 'armies' ready to serve." However, it is just as likely that librarians and library staff outside cataloging departments will create metadata or otherwise catalog and describe the objects contained in these institutional or subject-based repositories. The authors in this issue include computer scientists, library and information science faculty, digital librarians, and also catalogers. They are all describing, organizing, distributing, and preserving the research and pedagogical works contained in these repositories, and many have developed tools to work with and analyze the metadata contained therein.
The theme of this special issue, "metadata and open access repositories," is at once broad and extremely specialized. It includes discussions of metadata to describe research data, electronic theses and dissertations, learning objects, and scholarly articles. Several case studies present workflows for acquiring, describing, and disseminating digital content in repositories. The authors discuss many well-known repository systems, including ContentDM, DSpace, EPrints, Fedora, Greenstone, Opus, XTF, and homegrown systems. While various "flavors" of Dublin Core are the most common metadata formats, presumably because it is used in the Open Archives Initiative-Protocol for Metadata Harvesting (OAI-PMH), the authors identify other metadata formats and schemas, including METS, MODS, DIDL, MPEG21, Epicur, and ETD-MS.
Metadata quality is a common thread throughout this special issue. Resource discovery often occurs through Internet searching, or via cross-repository searching and harvested metadata. As Diane Hillman has noted,2 semantic and syntactic errors, which are problematic locally, compound in a networked repository environment.
In "Metadata Quality in Digital Repositories: A Survey of the Current State of the Art," Jung-ran Park reviews the literature and shows how the issue of quality has been measured and evaluated across repositories and then is reflected in scholarly publications. She argues for a adopting a common metadata model to increase the quality of data across repositories.
David M. Nichols, Gordon W. Paynter, Chu-Hsiang Chan, David Bainbridge, Dana McKay, Michael B. Twidale, and Ann Blandford compare two tools, MAT and KRIS, designed to help librarians understand and manage metadata within collections. They conclude with a checklist of requirements they developed to evaluate metadata analysis tools.
Dorothea Salo presents an issue with which repository administrators and metadata librarians are all too familiar. Name authority control is lacking within most individual repositories, and certainly is applied inconsistently across repositories. Regardless of repository software deployed, few tools exist to automate database clean up, much less eliminate the need through better application of name authorities. Salo describes why lack of name authority control is problematic, and presents some potential relief.
Gema Bueno-de-la-Fuente, Tony Hernández-Pérez, David Rodríguez-Mateos, Eva M. Méndez-Rodríguez, and Martín-Galán examine metadata harvested using OAI-PMH to determine how, if at all, learning objects are treated in institutional repositories. Their dataset includes forty-seven repositories from eighteen countries. Through their quantitative and qualitative analysis, they find that learning objects are often not consistently identified as such (for example, through use of the Dublin Core dc:type element), which could affect discovery and use.
The case studies included here document the variety of methods employed to create and enhance metadata and otherwise manage workflows and repository maintenance.
"University Scholarly Knowledge Inventory System: A Workflow System for Institutional Repositories," describes an open source system developed by authors Anne Morrow and Allyson Mower to manage rights and permissions associated with published journal articles for the University of Utah's institutional repository submission service. The authors note that workflows for managing and distributing digital collections tend to be developed locally, but institutions may benefit from applying an integrated workflow structure similar to what is found in traditional library systems.
College and university libraries have long been stewards of their institutions' theses and dissertations, with cataloging departments providing access to these materials; the rise of electronic theses and dissertations (ETD) programs have altered descriptive practices and departmental workflows, but cataloging departments still often have responsibility for managing access to these resources. Michael Boock and Sue Kunda describe how Oregon State University is handling this transition. They include an analysis of time spent on handling print versus electronic T/Ds, and conclude that electronic format results in a reduction of staff time spent on processing, and ultimately increased savings.
John W. Chapman, David Reynolds, and Sarah A. Shreeves discuss approaches to the creation and management of metadata for institutional repositories at three institutions: University of Minnesota, Johns Hopkins University, and University of Illinois Urbana-Champaign. All three institutions are using DSpace software and opted to centralize metadata creation and item submissions (though not necessarily within their respective libraries). Metadata approaches at each institution are described, and they conclude with a "from the trenches" discussion of how repository developers could better address metadata issues such as support for expanded metadata schemas and authority control.
Jessica Branco Colati, Robin Dean, and Keith Maull write about the repository under development by the Colorado Alliance of Research Libraries. Staff members from the Alliance were charged with developing a standards-based, interoperable digital repository that would support the variety of uses its member libraries requested. Each member library would have its own "front door." They selected Fedora as the repository system, with Fez as the front-end, and METS and MODS metadata schema. The article describes the choices they made, rationale, and implications of those choices. The authors include the XML schema developed for their Fedora repository.
The final two articles in this issue address new directions in building repositories to accommodate collections of research data.
The University of Sydney Library collaborates with researchers to develop methods for providing long-term access and preservation of research data collections. Rowan Brownlee's article addresses policies surrounding metadata for research collections, where native metadata formats often are very different from formats used in library-managed digital repositories. Using examples from the University, Brownlee considers options for addressing non-Dublin Core metadata in the University of Sydney's repository, and concludes with potential actions for working with research faculty to manage data.
In the final article, Jane Greenberg takes a theoretical approach to metadata issues surrounding research data. She explores the phenomena of propagation, inheritance, and value system adoption demonstrated by the metadata within Dryad, a repository of research data in evolutionary biology. She demonstrates that lifecycle modeling can guide decisions about metadata infrastructure in repositories to support the goals of data sharing and reuse.
1 Walters, Tyler O., "Reinventing the Library—How Repositories Are Causing Librarians to Rethink Their Professional Roles," portal: Libraries and the Academy 7, no.2 (Apr. 2007): 213-225.
2 Hillman, Diane I. "Metadata Quality: From Evaluation to Augmentation" Cataloging and Classification Quarterly 46, no. 1 (2008): 65-80.
*Michael S. Babinec is assistant head of the Bibliographic Services Dept. at Northwestern University Library, 1970 Campus Drive, Evanston, IL 60208 (m-babinec(at)northwestern.edu).
Holly Mercer is head of Scholar Services at the University of Kansas, Anschutz Library, 1301 Hoch Auditoria Drive, room 320K, Lawrence, KS 66045-7537 (hmercer(at)ku.edu).
The co-editors wish to thank editor-in-chief Sandy Roe for your patience and support, Jane Greenberg encouraging us to guest co-edit, and Taylor and Francis for their accommodations for this theme issue in which open access factors so prominently.