Bibliographic Database Quality

Jeffrey Beall and Stephen Hearn
Introduction, Jeffrey Beall and Stephen Hearn, Guest Editors


Is it worth it? Management issues related to database quality
Janet Swan Hill

ABSTRACT: Management issues related to the quality of a library’s catalog and its source databases reflect the continual evolution of both. As catalogs seem about to mutate rather than to continue to evolve gradually, a review of how management thought about database quality was developed may assist libraries to recognize which principles and circumstances remain valid and true, to assess which are no longer applicable, and to decide what actions to take. A persistent shortcoming in the decision-making process that needs to be addressed is the lack of serious research into user needs and benefits, and the actual impact on users of database quality decisions.

KEYWORDS: Database Quality; Database Management; Quality Control; Catalog Management; Catalog Evolution; Discovery Tools Evolution


Systematic identification of typographical errors in library catalogs
Terry Ballard

ABSTRACT: This article recounts the history of the creation of the database entitled Typographical Errors in Library Databases and describes the advances made in cooperative efforts to discover and fix typographical errors in bibliographic databases.

KEYWORDS: typographical errors; bibliographic databases


Making the pieces fit: little women, works, and the pursuit of quality
Allyson Carlyle, Sara Ranger, and Joel Summerlin

ABSTRACT: In current cataloging practice, the identification of an item as a member of a particular work set is accomplished by assigning a main entry heading, or main entry citation, in the bibliographic record representing that item. The main entry citation is normally comprised of a primary author name and the uniform title associated with the work. However, the quality of bibliographic records varies, and this means of identification is not universally used by catalogers. Thus, consistent identification and retrieval of records representing editions of works is not guaranteed. Research is reported that investigates the extent to which records that are members of a particular work set may be automatically identified as such.

KEYWORDS: Bibliographic records, OPACs, online catalogs, works, FRBR, cataloging quality


Metadata quality: From evaluation to augmentation
Diane Hillmann

ABSTRACT: The conversation about metadata quality has developed slowly in libraries, hindered by unexamined assumptions about metadata carrying over from experience in the MARC environment. In the wider world, discussions about functionality must drive discussions about how quality might be determined and ensured. Because the quality-enforcing structures present in the MARC world—mature standards, common documentation, and bibliographic utilities—are lacking in the metadata world, metadata practitioners desiring to improve the quality of metadata used in their libraries must develop and proliferate their own processes of evaluation and transformation to support essential interoperability. In this article, the author endeavors to describe how those processes might be established and sustained to support metadata quality improvement.

KEYWORDS: Metadata quality, metadata evaluation, metadata augmentation


Metadata Quality, Utility and the Semantic Web: The Case of Learning Resources and Achievement Standards
Stuart A. Sutton

ABSTRACT: This article explores metadata quality issues in the creation and encoding of mappings or correlations of educational resources to K-12 achievement standards and the deployment of the metadata generated on the Semantic Web. The discussion is framed in terms of quality indicia derived from empirical studies of metadata in the Web environment. A number of forces at work in determining the quality of correlations metadata are examined including the nature of the emerging Semantic Web metadata ecosystem itself, the reliance on string values in metadata to identify achievement standards, the growing complexity of the standards environment, and the misalignment in terms of granularity between resource and declared objectives.

KEYWORDS: Semantic Web, metadata quality, achievement standards, K-12 educational resources


The Perfect Bibliographic Record: Platonic Ideal, Rhetorical Strategy or Nonsense?
David Bade

ABSTRACT: Discussions of quality in library catalogs and bibliographic databases often refer to "the perfect record." This paper examines the usage of that phrase in the library literature, finding that its predominant use is as a rhetorical strategy for reducing the complex and context dependent issue of quality to an absurdity, thus permitting the author to ignore or dismiss all issues of quality. Five documents in which the phrase is not used in this fashion are examined and their value for understanding the inextricably intertwined values of quantity and quality are discussed. The author recommends rejecting both the rhetoric of "the perfect record" and satisfaction with "the imperfect record."

KEYWORDS: metadata quality, database quality, cataloging standards



Database quality is a simple expression for a complex, multifaceted concept. Attempts to analyze and measure database quality in library catalogs quickly come up against this complexity. Is quality a question of being error-free, or of completeness, or of currency? Is it measured by the word, or by the field, or by the record, or by the index, or for the catalog as a whole? How do issues of system functionality and performance factor into measures of database quality? Can one demonstrate a clear connection between database quality however defined and the catalog’s usefulness for its end users? Given that the standards for catalog data are in a constant state of flux, with new rules and new heading forms and new conventions replacing old ones, is maintaining a high quality database even a realistic goal? What should the goal be? The authors contributing articles to this special issue of Cataloging & Classification Quarterly offer diverse perspectives on the question of what database quality is and should be.

Janet Swan Hill looks back over the last half century and offers an account of how discussions of database quality have progressed from an insular illusion that the each library could fully control and possibly "perfect" its catalog, through a time when Library of Congress records and leadership promised a "gold standard" for quality, to a recognition that library data must prove its usefulness in an environment where the catalog draws from and is only one among many data sources. As comparisons between library catalogs and other data services become more constant, the neeed for guidance from studies of user needs is ever more pressing.

Terry Ballard offers a history of one element of the quality control effort, identifying and correcting typographical errors. As a key player in the development of a widely used tool for this piece of work, he has an insider’s perspective on both the evolving logic of the "Typographical Errors in Library Databases" list, and on the organizational and communications approaches that have enabled it to flourish.

The relationship between data quality and the uses we want to make of data is the focus of the other four contributors. Allyson Carlyle, Sara Ranger, and Joel Summerlin examine the challenge of using the data in catalog records to associate records for different manifestations into sets representing single works, as defined by the Functional Requirements for Bibliographic Records (FRBR). Their study illustrates well the difficulties of finding relationships between records which were not built with those relationships in mind.

Diane Hillmann offers the perspective of a metadata manager working across multiple data sources. What is the focus of quality control when correcting individual records is not really a scalable option? New issues emerge, such as what kinds of batch data manipulations are appropriate, and how well the data’s own history of manipulation is recorded for future manipulators and applications. With Hillmann, we move beyond distinguishing the forest from the trees to distinguishing different kinds of forest and forest ecologies in the larger information landscape.

Stuart Sutton offers a case study in the challenge of getting metadata from different sources to interoperate effectively. His starting point is the desire of educators to find ways to correlate the learning resources available to them with the educational goals and standards often defined for their work by state agencies. Though one might imagine that the concepts and terminologies and used by both educational publishers and their intended market would be in harmony, Sutton finds that diversity, both conceptual and terminological, is more commonly the case.

The issue closes with David Bade’s incisive analysis of the use of "the perfect record" as a straw man in past arguments about database quality. Too often, he says, the "perfect record" notion has been held up by writers as an abstruse ideal worshipped by an army of martinet catalogers. These writers’ intent is often to justify more pragmatic approaches to issues of data quality; but Bade demonstrates that this rhetorical device is hollow and misleading, and that a pragmatic understanding of data quality across all parties is much more the rule.

One theme uniting all the articles is that data is meant for use, and that our estimation of database quality depends not simply or even primarily on its adherence to certain rules and forms, but on how well it responds to the uses we wish to make of it. In some cases, the utility is fairly obvious—typographical errors can make records hard to find—while in other cases, as in Sutton’s comparison of learning resource metadata and educational goal statements, the discrepancies between metadata sets with a supposed natural affinity may not become apparent until users actually try to navigate between them. In an environment where diverse databases are expected to play well together, the challenges of database quality are less about standards creation and development and rule adherence, and more about social and organizational collaboration. Librarians have an unusual history of broad, collaborative efforts in managing data for the use of a large community. This experience can be of value as we move into the ever more complex environment of interoperable metadata management.

The co-editors would like to express their appreciation for the work of Sherry Velucci on the initial development of this issue and its content, and for the invaluable assistance of general editor Sandy Roe.

Jeffrey Beall
Stephen Hearn


