Cataloging & Classification Quarterly
Volume 36, no. 1, 2003
Sandy Roe, News Editor
Welcome to the news column. Its purpose is to disseminate information on any aspect of cataloging and classification that may be of interest to the cataloging community. This column is not just intended for news items, but serves to document discussions of interest as well as news concerning you, your research efforts, and your organization. Please send any pertinent materials, notes, minutes, or reports to: Sandy Roe; Memorial Library; Minnesota State University, Mankato; Mankato, MN 56001-8419 (email: mailto:firstname.lastname@example.org). Phone: 507-389-2155. News columns will typically be available prior to publication in print from the CCQ website at http://www.catalogingandclassificationquarterly.com.
We would appreciate receiving items having to do with:
Research and Opinion
Abstracts or reports of on-going or unpublished research
Bibliographies of materials available on specific subjects
Analysis or description of new technologies
Call for papers
Comments or opinions on the art of cataloging
Notes, minutes, or summaries of meetings, etc. of interest to catalogers
Description of grants
Description of projects
Announcements of changes in personnel
Announcements of honors, offices, etc.
Summary of The Cataloger, the Public Services Librarian and Metadata: Can this marriage be saved?, presented at the Midwinter American Library Association Meeting, New Orleans, Jan. 20, 2002.
Want to see a lively presentation about primary library issues? Take a look at OCLC’s presentation, entitled The Cataloger, the Public Services Librarian and Metadata: Can this marriage be saved? It presents a humorous view of the issues between patrons, catalogers, public service and administration personnel. The content is full of plays on words, puns, and other humor that only a librarian can best appreciate. The premise is the troubled marriage of Marian and Phil, both librarians and married for ages, who appear on Dr. Sarah’s talk show.
The host, Dr. Sarah, is portrayed by Madeleine Lefebvre, University Librarian at Saint Mary’s University, Halifax, Nova Scotia. The troubled couple whose marriage is on the rocks are Marian, a cataloger in an academic library, and her husband of 30 years, Phil, a reference librarian at a large public library. Marian is played by Betsy Friesen, Monographs and Special Formats Cataloger for the Bio-Medical Library, University of Minnesota, Minneapolis, Minnesota, and Phil is played by James McPeak, Director of Mayfield Regional Library, Cuyahoga County (Ohio) Public Library System, Mayfield Village, Ohio.
When the program begins, Dr. Sarah admits to the audience that she does not have much hope for being able to resolve some of this couple’s issues. She notes that both Marian and Phil put a great deal of effort into their own personal lives, but not into the institution of their marriage. Each now works alone to try to fulfill the needs of their twin children, Pat & Ron. As Dr. Sarah interviews Phil and Marian, various aspects of their personalities and marital issues emerge.
Marian is described in stereotypic cataloger fashion as a recluse, usually staying in the back room to catalog the family’s documents and information, tirelessly creating metadata records, each having a wealth of detail. She lives by standards and rules that Phil does not understand, follow, or really care about.
Phil openly describes himself as extrovert. He enjoys meeting people, asking them questions, helping them find answers to their questions quickly and efficiently. Phil even carries his own notebook with hundreds of URLs written on Post-it® notes because he feels he needs to provide answers for the kids on the spot.
Then there are Phil’s parents, Ad & Min, who ignore Marian and rarely support her. Yet they brag about how many questions Phil answers and how many times he works with Pat & Ron, makes sure he is well taken care of, and generally are so focused on Phil that they do not even notice their son’s marriage is in trouble.
The point of the whole presentation was to bring catalogers and public service librarians together and show them what their stereotypes and relationship to one another look like. That is, catalogers have a tendency to want everything to be perfect, which can take a very long time. They want detailed description, and they want to work on resources that have staying power. Reference librarians want to be able to satisfy the needs of the patron quickly and accurately. They look at some of the detail that catalogers create as unnecessary. Furthermore, the play reveals that the real detriment from the tension between catalogers and public service librarians is not to the library, but to the patrons. While Can this marriage be saved? did not present any real solutions, it did present a key issue in libraries today in a non-threatening manner and provided a great deal of food for thought.
A video recording of the event was made and is available through interlibrary loan. More information on obtaining the video through ILL, and a sneak peak in Windows MediaPlayer format is available on OCLC’s Events and Conferences Video-on-demand Web site (http://www.oclc.org/events/videoondemand/metadata/).
Robert Bothmann, Special Formats Cataloger
University of Minnesota Libraries
Summaries and Reflections of Thesaurus Design for Semantic Information Management, a day-long seminar led by Prof. Bella Hass Weinberg in New York on April 16, 2002
Because I had the good fortune of being able to attend Bella Hass Weinberg’s thesaurus seminar, and because I know of others who, while unable to attend, were interested in hearing about it, it seemed worthwhile to share some information through this brief review. Weinberg’s views on the subject are of particular interest because she chaired the committee that revised the ANSI/NISO standard on thesaurus construction (Z39.19), which was published in 1994 and reaffirmed in 1998. I do not pretend to be an expert in this field, but I have tried my best faithfully to represent Dr. Weinberg’s views, and to distinguish my digressions from the actual content of her seminar.
As the seminar began, Weinberg took some time to discuss several recent neologisms relating to thesauri. She suggested, for example, that semantic information management (following the phraseology of Tim Berners-Lee in his article “The Semantic Web”) is essentially a synonym for vocabulary control; that ontology usually means classification scheme, but is sometimes used as a synonym for thesaurus or semantic network; and that taxonomy is generally synonymous with classification. She also drew a distinction between thesauri proper, where headings represent unitary concepts designed for post-coordination, and subject heading lists, where headings representing a combination of concepts are pre-coordinated in a fixed sequence. Subject heading lists such as LCSH are essential tools for managing information in a print environment, e.g., a card catalog, while true thesauri are often more useful in the online environment.
In order to establish descriptors (i.e., headings, preferred terms), thesaurus editors need to distinguish among homographs, and select a preferred term from among synonyms. Homographs, words that though spelled alike have different meanings, can be disambiguated through parenthetical qualifiers, as in “letters (alphabet)” versus “letters (correspondence).” Synonymy poses the opposite challenge (more than one term for a single concept), but is easily managed through cross-references. In a medical thesaurus, for example, one can simply have “Cancer. USE Neoplasms” to direct users to the preferred term.
Other considerations include having to choose between singular and plural; among parts of speech (nouns are not the only parts of speech that can be included); and between home-language terms and their foreign-language equivalents. Consider the field of medicine where, even though the lungs come in pairs of two, researchers still describe themselves as studying "the lung" (i.e., in the singular). Different professions and interest groups are likely to make independent choices for preferred terms.
While achieving internal consistency is challenging enough in a single-language thesaurus, the difficulties are compounded when multiple languages are involved. The profound limitation of multi-lingual thesauri is conceptual. Because there is no one-to-one correspondence in the mapping of terms from one language to another, any attempt to fit them into parallel logical relationships will cause semantic distortion. Other challenges in the construction of multilingual thesauri are more technical in nature, and can be adequately managed given enough time and attention. For example, it occurs to me that since some languages can be written in more than one script, a thesaurus designer might need to choose a preferred script in which to represent the descriptor, and then cross-reference the alternatives. Display and sorting considerations (e.g., right-to-left orientation for Arabic; non-alphabetic ideographs for Chinese) add yet another challenging component, as is evident, to pick one of many sources, in MARBI Discussion Paper no. 108 (http://lcweb.loc.gov/marc/marbi/2001/2001-dp05.html).
The challenge in selecting thesaurus descriptors is largely one of determining a set of appropriate lexemes (that is to say, establishing the smallest units of a lexicon that can be understood on their own terms). Factors needing to be taken into consideration in the assignment of lexemes would include the frequency with which its corresponding concept occurs (i.e., its relative importance to the lexicon) and the potential for false drops when searching for that concept. To illustrate this point, consider that a reader looking for information on “library school” might enter the search terms “library” and “school” into a bibliographic database, only to retrieve thousands of hits on school libraries. In this case, the reader could avoid the avalanche of false drops if the phrase “Library school” were considered a lexeme, and made searchable as a bound phrase.
A bound phrase such as “library school,” or, to pick another example, “grayish purplish blue” can be considered a lexeme if it is treated as such by thesaurus end-users in a given domain; the term “grayish purplish blue,” and many others like it, appear as descriptors in the Art and Architecture Thesaurus (AAT). This descriptor can be searched in the AAT as a single phrase for a specific color concept, whereas breaking up the terms (say, in a Boolean keyword search) could lead to innumerable false drops. A lexeme can thus consist of one word or multiple words, as long as it refers to a single clear and distinct concept, and as long as it is commonly understood this way by its intended readers.
The decision about which terms get to be descriptors, and which become cross-references does not need to come from professional indexers. There is a movement afoot in the programming and information science communities to produce end-user thesauri. According to this model, search transactions can be analyzed by humans or machines, and the most commonly used natural language terms found therein can then be selected as descriptors. This process develops a vocabulary based on user warrant.
I shall now shift my focus from the descriptors themselves to Weinberg’s discussion of the semantic relationships in which they are embedded. Semantic relationships can be coded more precisely than is often the case in basic thesauri. Instead of the simple Narrower Term (NT), Broader Term (BT) relationships, for example, thesauri can make use of “relationship indicators” such as (1) Broader or Narrower Term Generic (BTG or NTG), (2) Broader or Narrower Term Partitive (BTP or NTP), and (3) Broader or Narrower Term Instance (BTI or NTI). These codes provide a means of more accurately describing the semantic relationships among descriptors. For example, “skull” is an NTP to “head,” because it stands in the relation of part to whole. “Renoir” is an NTI to “French painters” because the former is an instance of the latter. BTG and NTG are really the same as BT and NT, namely, broader and narrower terms representing class inclusion. The more finely coded relationships allow humans or even machines to draw inferences that are logically more precise, and the implications for artificial intelligence programming are not hard to see. Polyhierarchy, whereby a given term is allowed to have more than one broader term, can similarly help to clarify semantic relationships, such that “pianos,” for example, can be an NT of both “percussion instruments” and “stringed instruments.”
The semantic relationships within a thesaurus impose a cohesive system on a multitude of terms. Disparate thesauri can also be integrated into a larger system. Diverse semantic systems supported by diverse groups can be reconciled through their incorporation into a larger thesaurus.
Vocabulary switching among different thesauri is one of the key challenges in semantic interoperability. An interesting example of vocabulary switching is the Unified Medical Language System (UMLS), sponsored by the National Library of Medicine. One of the UMLS knowledge sources is called the Metathesaurus. According to the UMLS website: “The Metathesaurus provides a uniform, integrated distribution format from over 60 biomedical vocabularies and classifications, and links many different names for the same concepts.” It should be pointed out that the UMLS mapping is done by humans, not by machines. See http://www.nlm.nih.gov/pubs/factsheets/umls.html.
Weinberg’s survey of thesaurus development can help to broaden our understanding of descriptors and the manifold ways they are selected and presented. The NISO guidelines on “relationship indicators” (such as “Broader Term Partitive”) shed light on the subtleties of vocabulary control. Looking toward the future, Weinberg points out that traditional thesaurus terms and relationships are now being incorporated into XML/RDF syntax for digital resources. For those of us working in the field of controlled vocabularies and subject analysis, there is a lot to be gained from Weinberg’s and her colleagues’ work, and a lot to look forward to from the developers of thesauri.
Acknowledgments: I would like to thank Bella Hass Weinberg for her many suggestions and her careful editing of multiple drafts of this summary. Needless to say, any remaining mistakes should be credited to me.
Daniel Lovins, Hebraica Catalog Librarian and Team Leader
Sterling Memorial Library, Yale University
Serials Management Systems: Optimizing Full-Text Control and Access, a conference presented by MINITEX Library Information Network, May 6, 2002, University of Minnesota, Twin Cities
This conference was organized to introduce to librarians in the MINITEX region, which encompasses Minnesota, North Dakota, and South Dakota, the benefits of serials management systems. These benefits include reduced staff time in populating electronic serials holdings lists, reduced cataloging workloads, and improved access to journal titles and university resources. The primary purpose of these systems is to organize the electronic journal holdings of an institution into one searchable, Web-accessible database, rather than requiring staff and patrons to search several different sources in order to find desired information.
MINITEX Director Bill DeJohn and MINITEX Electronic Resources Librarian Angi Faiks welcomed the group and introduced the first group of speakers. These were representatives from the vendors of three serials management systems: JournalWebCite, SerialsSolutions, and TDNet.
Ben Adams, co-founder of JournalWebCite, Inc. (http://www.JournalWebCite.com), began by describing his company and product. JournalWebCite gathers journal information from about 300 providers to create a customizable listing of a library’s journal list. Providers include aggregators, which are companies that combine electronic journals from a variety of sources into one databases, and publishers, which provide information from journals from their companies alone. Although this list is hosted on JournalWebCite’s server, library patrons will not notice that the list is maintained elsewhere. Also, libraries can customize fonts, colors, and other features of the company’s page to match the appearance of the library’s Web site.
The company can provide database overlap reports and collection development reports to assist librarians in determining which databases they might consider removing from their collection and which they should add.
Turning to search strategies, JournalWebCite now uses approximately twenty broad, generalized subject headings. The company plans soon to use Library of Congress Classification headings as subject access points.
The company offers two levels of its product. JournalList-Lite offers to each library a customized holdings list, customized format delivery (HTML, XML, CSV, Tab, SQL, and MARC), free setup, integration with any network environment, and a configuration tool designed to easily control customer data. JournalList-Standard offers several more features, including a search tool, custom subject lists, database overlap and journal overlap reporting, usage statistics, print holdings integration, and cost analysis. The Lite version is generally 70% of the cost of the standard version. Consortia discounts are available.
SerialsSolutions (http://www.SerialsSolutions.com) representative Chris Pierard spoke next. He began by describing some the problems that libraries face when trying to locate full-text electronic journals, including the often lengthy amount of time it takes to find journal titles and the great amount of time librarians spend creating databases to inform patrons of available titles.
Pierard’s company provides a single, comprehensive electronic list of a library’s full-text, electronic format journals. These lists include holdings dates, duplicate holdings, and library-specific journals for certain databases. A library can incorporate local print holdings into the list and provide a journal-specific URL for a print journal that will lead the patron to the online public access catalog of the library. SerialsSolutions also offers a paper version of its service, which can be a valuable tool for evaluating acquisitions and is a simple way for patrons to browse the library’s holdings. SerialsSolutions covers more than 400 databases and these are updated monthly. Their lists exclude items that are not issued serially or not full-text. The company’s pricing is based on the number of full-text journal holdings tracked for the library. Peter McCracken, the company’s founder, insisted on upfront pricing.
Services that SerialsSolutions plans to offer in the future include HTTP searching, title-level subject searching, and full MARC records.
David Fritsch, Vice President of Israel-based TDNet (http://www2.tdnet.com), was the third vendor representative to speak. He identified four common problems with locating electronic journals, and then went on to elaborate on TDNet by presenting the group with a live demonstration. To set up service with the company, a library need only give journal titles. TDNet sorts and maintains the list. When searching, a patron can see the journal title, period of online coverage, full-text access, print holdings, and journal tables of contents. This latter option, the T.O.C. feature was unique among the presenting vendors and allows library users to create their own profiles to get customized table of contents lists.
TDNet librarians work with a template to create a Web page that is in a style similar to their library Web site or institution’s Web page specifications. One part of the template that cannot be changed is the TDNet logo, which must remain on the page in some way. Several librarians expressed concern about this issue. Libraries can link directly to articles that are in the public domain, virtually eliminating the search time needed to find the article. The site contains a “Jump to Page” feature, meaning a user can click from one page to just about any other page within TDNet’s site. Library users can search for information in various ways using free text, such as words from an article title or journal title. Librarians can tailor TDNet reports to meet different needs, such as statistics reporting and database usage reports.
TDNet pricing is based on several factors. The first consideration is the number of unique titles that the library carries. Additionally, pricing is based on what pieces of functionality and data the library chooses to utilize. Fritsch mentioned that the company would work with the library’s budget to determine pricing.
Future enhancements of TDNet include moving from weekly updating to immediate, library-controlled updating, and a MARC-record based system for the database.
In the afternoon, three librarians discussed the databases’ impacts on their libraries. The speakers were Donald Root, Assistant Chief of Collection Development at Free Library of Philadelphia; Allison Mays, Collection Development Librarian at Millsaps College in Jackson, Mississippi; and Greg Szczyrbak, Reference Librarian at York College of Pennsylvania.
The three librarians expressed the same kinds of praise for the systems: increased ease in finding titles, increased speed in locating journals, and decreased burden on catalogers and staff to create in-house lists of electronic serial titles. Mays suggested creating a small list to submit to a vendor as a trial to see how the vendor’s product operates.
The day ended with a question-and-answer session to give librarians a chance to clarify issues or address concerns. Topics discussed included compatibility with software designed to assist people with disabilities, the breakdown of library customers (academic/special/public), cost effectiveness studies, and remote access ability (each company’s product can be accessed through proxy servers). One concern expressed throughout the day was the inability of the companies to ensure that the journals listed as full-text are actually full-text. Because the serial management companies do not subscribe to the full-text databases themselves, they can only rely on the database producers to give them accurate database content information. Chris Pierard suggested that librarians continue to express their need for accurate descriptions of databases to the database aggregators themselves.
Serials management systems can greatly increase a library’s usage statistics by offering an efficient way for library users to find and use information. Costs are reduced by decreasing librarian workloads and possibly by canceling database subscriptions that could be redundant. Perhaps most importantly, the library’s image will be improved when administrators and students realize the vast array of journals that are available all from one site. All of the systems described above would be a benefit to any library’s collection, budget, and standing in its institution or community.
Dustin Larmore, Technical Services Librarian
Karl E. Mundt Library, Dakota State University
Meeting Minutes of the ALCTS Technical Services Directors of Large Research Libraries Discussion Group (“Big Heads”), held during the Annual American Library Association Meeting, Atlanta, GA, June 14, 2002
Welcome, introductions and announcements (Larry Alford, Chair)
Chair Larry Alford (UNC-Chapel Hill) welcomed the group and paid tribute to Judith Hopkins’ volunteer work of preparing the minutes and placing them and the round robin reports on the web.
Election of Vice-Chair/Chair-Elect
Arno Kastner (NYU) was elected unanimously.
Update on University of California’s Mellon-fund ejournal study
When do end users prefer digital content and when do they prefer hard copy? Brian Schottlaender (UC-San Diego) said the study was collecting empirical data on faculty and students’ use of digital journals. The University of California System developed criteria for a multi-institutional project that has been funded by the Mellon Foundation. They now have six months of data (he distributed a spreadsheet showing results from October 2001–March 2002). He urged that these preliminary results be accepted with caution. UC plans to publish a mid-project technical report in College and Research Libraries and a more philosophical report in Portal. The campuses of the University of California have a stable if not deteriorating infrastructure; libraries are crowded. There is a fairly substantial set of electronic journals on all the system’s campuses, about 7,000 at the beginning of the project; now the number is 50 percent larger. The system also has two storage facilities, one in northern California and the other in the southern part of the state. The southern one can hold 11,000,000 volumes yet it is beginning to be filled up. There is also unplanned redundancy between the two facilities.
Approximately 300 journals are being used in the study (five percent of the corpus) for each of which at least two print copies were available in the system. The control campuses made material available as usual while the experimental sites removed physical copies to remote storage and asked users to depend only on the electronic version unless they specifically expressed a need for the print version. UC used the following criteria to choose sample journals: availability of digital use data from the publisher; a mix of journals for some of which current issues were available in both forms and for some of which only electronic current issues were available; journals in various disciplines; journals with various physical characteristics such as lots of graphics, text in various languages, and articles with various lengths. Forty percent of the sample was from the physical sciences, forty percent from the life sciences, ten percent from the social sciences, and ten percent from the humanities. The research objectives are to discover the factors determining acceptability of digital over print in the journals themselves, in the characteristics of the users, and in the users’ technology environment. Another objective is to see whether or not the purpose for which a journal is used determines the acceptability of the digital version. They will gather six more months of quantitative data and will interview users to get a sense of their attitudes; this latter process is expected to last for the remainder of the year 2002.
Tentative quantitative conclusions: 1) print use is higher when print is on site; 2) print use is very low whether or not the print version is shelved on site; and 3) digital use is one or two orders of magnitude higher than print use, whether or not a print version is available. The findings are constant and stable across all disciplines.
Tentative qualitative conclusions: Content isn’t always available in digital form. There are three reasons for this: 1) a curatorial decision to omit matter such as advertisements, preliminary matter, indexes, etc.; 2) material is absent not by curatorial decision but capriciously; and 3) publisher takes content down.
There are two types of omitted data: 1) publishers stop making a particular title available electronically, or 2) they remove content from a particular title. These findings are making the University of California re-think licensing agreements.
Sally Sinn (NAL) asked if licensing arrangements let them know of omissions or if users let them know. Brian responded that the information came from users. Bob Wolven (Columbia) asked if the designation of experimental versus control campuses was by campus or title by title. It is title by title. Some institutions made it known that certain titles were being taken off shelves, others didn’t. In response to a question from Ann Okerson (Yale) Brian said you might be safe in making only electronic available to users as long as there was a print backup somewhere (not necessarily locally). Judi Nadler (U of Chicago) asked how the findings of the Outsell survey matched UC’s findings. She commented that faculty like to find things online and then to use them in print. Brian said their preliminary results paralleled that of the Outsell data; people want to use library data but not to come to the library. Larry Alford commented that at the University of North Carolina at Chapel Hill visitor data is higher than ever. Brian quoted someone as saying people are coming to the library but not to see us! Someone asked about the conclusions he is drawing about continuing availability of print; how long will publishers keep printing? Rosann Bazirjian (Penn State) asked if he could categorize the type of users who asked for material from storage: his impression was that it was largely faculty-driven, chiefly faculty from the social sciences who wanted students to study publications qua publications.
Information exchange on issues of long-term preservation of digital resources and the role of the Library in digital preservation efforts
Judi Nadler provided a framework for discussion. The challenge she presented was as follows:
As custodians of cultural heritage, libraries have served the role of repositories of traditional research resources and have been entrusted with their accessibility and long-term care. Extending from traditional resources to also include resources in electronic form, and assuming responsibility for creating, converting, and acquiring such resources, libraries have also assumed the responsibility for providing access to and ensuring maintenance of these resources over time.
Discovery metadata (cataloging) and traditional means of preservation ensure access to and long-term life of paper resources. The inherent fragility of digital resources requires more attention, often much sooner than resources on paper.
Also, in addition to discovery metadata, preservation metadata (technical, administrative, structural, and other) is required for the maintenance and long-term usability of electronic resources. Decisions regarding preservation metadata and the creation thereof must take place up front, as early in the process as possible. Standards for preservation metadata are still evolving. Understanding, documenting, and following the standards is crucial for future interoperability.
The relation between digital preservation and property rights is not clear yet, and sources of expertise must be identified and nurtured.
The investment in digital resources is high and the layout of related organizational structures varies. The more decentralized the responsibilities, the greater the need for ongoing communication and a shared decision-making process.
Ms. Nadler suggested these issues for information exchange. How equipped are we to:
1) develop a library knowledge base on issues of long-term care of digital resources
2) foster awareness and monitor development of standards
3) assess parameters of scope and scale (what level of resource commitment can the library make to the long-term care of its digital resources, whether converted, created, or acquired – selective, comprehensive, project-based, program level?)
4) assess breadth of commitment the library wants to or is expected to make
5) decide on role the role of the Library -- as repository for University, as advisory to University
6) explore models relating to building a local repository, cooperating with others, identifying and using trusted repositories, some combination or all of the above
7) explore and weigh options for technical strategies
8) develop and implement models for organizational structures that best support the various aspects of our digital activities?
How are we set up to support these activities? Catherine Tierney (Stanford) asked if there is something in particular about these things that are different enough to make us approach them differently. Judi Nadler replied that the difference is one of scale; selection is an issue as well (what content should be preserved?). What does preservation mean in this context? Jeffrey Horrell said that Harvard has been thinking about the distinction between enduring collections and ephemeral collections. Some things move between these two. Different criteria and assumptions are made for each type. See http://hul.harvard.edu/ldi/ for information on the Harvard Library Digital Initiative. Judi Nadler said the criterion they would like to promote at Chicago is: the materials they are responsible for creating are those they are responsible for maintaining. Duane Arenales (NLM) said there are legal issues; for materials born digital decisions cannot be delayed as long as they can for materials born print. One has to work to determine that born-digital items are complete and are in a form that can be transferable to other formats. Creators need a way to make it known that they are taking responsibility for maintaining such items and keeping them up-to-date. NLM is using an adaptation of the Dublin Core for their own publications. Larry Alford said that the University of North Carolina at Chapel Hill (UNC-CH) is creating catalog records for each item they are keeping (about 1200 so far in OCLC). Bob Wolven referred to efforts of the Digital Library Federation and OCLC to maintain registries but added that we still don’t have organizational models. Jeffrey Horrell (Harvard) said a number of us have been working with the Mellon Foundation. The economic and publisher/licensee issues are among the most complicated to develop models for. Sally Sinn asked how many are directly engaged in working with their institutions to do something about practices for digitizing materials. Beacher Wiggins said the Library of Congress (LC) has gotten a hundred million dollars to look into this and to bring the many players together. The grant will require matching funds (either in cash or material). LC is using the first 5 million of the grant to make contact with others: publishers, users, etc. LC doesn’t have anything to share yet but hopes to have a more comprehensive plan by the time of the Big Heads midwinter meeting in Philadelphia in January 2003.
Judi Nadler listed some of the types of help the University of Chicago would be willing to contribute. Cynthia Shelton (UCLA) said the California Digital Library is taking the lead in California to develop a model and to make decisions. Ann Okerson commented on the great variety of knowledge needed to make these decisions: university computing, library systems, catalogers, preservation, etc., all will need to be involved. She thought we do not consider end users enough and urged that the Mellon Foundation support a study of user needs in various disciplines. Publishers also need to be involved because we need them to produce material in formats that will be used. Duane Arenales commented that one difference between digital resources and traditional library resources is the dynamic nature of this material; there are differences in digital publications over time, in fact the Washington Post online shows differences within one day. It was concluded that this is a topic we may want to continue to monitor and exchange information on.
University of Washington Licensing Metadata project/DLF NISO metadata meeting: E-Resource Management Metadata and Systems (Jim Stickman)
Jim Stickman (U of Washington) reported on continuing efforts by Tim Jewell, Head, Collection Management Services, University of Washington, and others to inventory data elements and functions in emerging systems that help librarians manage licensing and support of electronic resources.
At the Midwinter ALA meeting in 2001 Tim led a discussion sponsored by Big Heads that attracted some 40 librarians and led to further discussions of functions and data elements. Following this meeting, an informal steering group was formed that included Tim, Ivy Anderson (Harvard), Adam Chandler (Cornell), Sharon Farb (UCLA), Kim Parker (Yale), and Nathan Robertson (Johns Hopkins). This group worked with Pat Harris and Priscilla Caplan (NISO) and Dan Greenstein (then at DLF) to conduct a successful Workshop on Standards for Electronic Resource Management at a Digital Library Federation (DLF) meeting on May 10, 2002 which was attended by fifty librarians and representatives from a number of vendors and publishers.
One outcome of the workshop was general agreement that standards would be helpful to all parties. Subsequent discussions among the meeting organizers identified two complementary “tracks” for follow-up work to be undertaken in the near future. One track would aim at the development of a general “functional specification/best practice” document, while the other would focus on the areas where data is most likely to be exchanged over time, and therefore be most likely to benefit from formal standardization. The steering committee is developing a proposal to DLF requesting support for a project to foster the rapid development of improved tools for managing licensed e-resources – whether by individual libraries, consortia, or vendors.
The steering committee is providing an update and leading another discussion at a meeting later in the Atlanta conference. The steering committee seeks the continuing sponsorship of Big Heads through ALA’s 2003 annual meeting in Toronto. More information is available at the Web Hub for Developing Administrative Metadata for Electronic Resource Management at: http://www.library.cornell.edu/cts/elicensestudy/.
Jim noted that the University of Washington Libraries has recently begun working with Innovative Interfaces Inc. on the development of an electronic resources management module for the III automated library system.
Judi Nadler thanked Mr. Stickman and the group for doing this work. Digital tools for managing electronic resources are something many libraries need, hers included. She asked how receptive are vendors to coming up with such tools. Jim Stickman said that the vendor of the system that the University of Washington uses, Innovative Interfaces Inc. (III), is very interested in modifying that system to include more licensing data and to establish relationships among data, to provide better reporting and better catalog displays for use. The University of Washington is putting together a draft list of elements and various scenarios for III use. He said he hopes that other vendors will do something similar. Judi Nadler asked if this would be a tool that would become an intrinsic part of III or something that could be used by other vendors. He couldn’t answer but said he thought that III was thinking of the former.
Publications Pattern Initiative (Sally Sinn and Jean Hirons)
Sally Sinn, Chair of the CONSER Task Force on Publication Patterns and Holdings, summarized the aims of the initiative and its successes to date (see also http://www.loc.gov/acq/conser/patthold.html). She noted that we have the MARC21 holdings format and vendors profess to be compliant with it but not all pieces are yet together. Suppose you are planning to move from original system A to vendor system B but the vendor cannot deal with bibliographic data from system A. That is analogous to the need for the establishment of a national system of publication pattern data that can migrate as libraries change integrated library systems.
The CONSER Task Force on Publication Patterns and Holdings seeded the publication pattern database by taking 40,000 Harvard records and attaching them to bibliographic records in OCLC; OCLC created an 891 field (Publication Pattern Data) for the data. What is the time and effort investment involved in this? It is the first real implementation of the MARC21 Format for Holdings Data.
The task force is working to keep systems vendors involved. What do vendors offer now when they profess to be compliant? How capable are they of being able to output data? How able are they to accept input data? As of June 2002 the task force completed a two year pilot project to add patterns to the database, including four to five thousand records contributed by participants plus the Harvard records. The task force has submitted proposals to MARBI to add new coding to the holdings format and have two vendors now able to make use of the patterns from the OCLC records. The task force worked on the SCCTP (Serials Cataloging Cooperative Training Program) course on MARC holdings for serials and formulated a statement on what compliance with the holdings format is. See the CONSER Web site, http://lcweb.loc.gov/acq/conser/MHLDdefinition.html. Libraries need to use existing standards and to put pressure on vendors to support standardized application of MARC holdings.
The CONSER Publication Pattern Initiative has been wedded to print versions but needs to relate to digital as well. We may need to determine publisher intent for born digital materials. We are in the infancy of what we need to do with publication pattern data. There is still much skepticism about the value of contributing to a database of publication patterns; the perception is that it incurs added cost and added processing time. Sally Sinn said it is not an onerous addition. There is also a perception that there is no long term value to it. If we found that all vendors had implemented a standard method of transferring holdings data just as bibliographic data is transferable, wouldn’t that be worthwhile?
The pilot project is ending but the participants have agreed to continue contributing publication pattern data. If you believe that this is a valuable effort, pressure your vendors to become compliant. Jean Hirons (CONSER Coordinator, LC) put in a plea for more participation; the more records are input in OCLC the more pressure will be put on vendors. Participants must be able to work on OCLC even if they are not CONSER members; they would be given CONSER Enhance status. Duane Arenales asked about participation by subscription agents. Jean Hirons said someone from EBSCO is part of the group but none are yet using the holdings format. Duane Arenales said perhaps the format is too complicated for them. Judi Nadler said CONSER needs to provide cost figures for managers and to continue to focus on benefits. Sally Sinn asked how participation has been included in libraries’ work flow. We need to decide where data should ultimately reside. Perhaps we should put publication pattern data in a separate database linked to CONSER bibliographic records. Someone commented that when you catalog you have only the first issue and the pattern is not clear. There needs to be a way for non-catalogers to update the record a year or so later. Jean Hirons said use of the Bremer macro makes original input very easy; it takes only 2-3 minutes.
OCLC new interface migration plans (Glenn Patton)
Glenn Patton (OCLC) distributed copies of the new OCLC brochure and said it is available at http://www.oclc.org/connexion/brochure.pdf. Also available is a functionality list (http://www.oclc.org/connexion/features). The key points are: The initial version of the browser interface will become available on June 30, 2002. Once it is available you can try it out; all current authorizations and passwords will work. There is no software to download. Connexion contains all the current functionality of CORC, the CatExpress interface, and WebDewey and has been expanded to deal with all types of materials. The plan is for quarterly enhancement releases to begin fall 2002. There will also be a Windows client to Connexion, starting in the second quarter of 2003.
How does the implementation of Connexion affect current interfaces to cataloging? Passport will work until the end of 2003; CatME will work for the foreseeable future. Z39.50 access to cataloging will continue to be available.
The Connexion website has a functionality listing (10 pages) which allows you to compare what can be done in either Passport or CatME and the functionality available in the first version of the browser interface and Windows client. Various migration paths are possible (see p. 3 of brochure) but they generally fall into one of three groups: 1) migration to Connexion now via your browser, 2) continue using Passport or CatME until the features that you need are available in Connexion, or 3) continue to use Passport or CatME until the Windows client to Connexion is available.
The crucial factor is to examine your workflows in conjunction with the functionality list. Other advice he offered is, the more complex your workflows, the more likely you will want to wait until the Windows client is available with its macros feature. Another factor related to workflow is that the Save files now available will not be same as the Save files in the browser interface; users will need to clean out Save files prior to migration.
Someone asked whether we need to think of this as institutional migration vs. individual migration? Glenn said that depends on workflows. Staff who do different types of work might be able to move at different times. Someone asked when current CJK functionality will move? Glenn said that current CJK and Arabic software will continue to function as they do until brought forward into the Windows interface; this involves Unicode implementation. While OCLC does use some Unicode now for CJK and Arabic and parts of CORC, mid-2003 is a possible date for full implementation of Unicode. Bob Wolven had two questions: First, would it be a fair statement to say that Passport would remain functional past Dec. 2003 if the need arose? Glenn answered that the probable answer is yes. Secondly, what are the plans for migration of internal units of OCLC that do cataloging, quality control, etc.? Glenn said his staff has moved to CatME; he said some high volume activities will probably stay with Passport until the Windows client become available.
Cynthia Clark (NYPL) asked what kind of training will be provided or is Connexion expected to be intuitive. Glenn said there will be a tutorial offered along with the Windows interface; regional networks will provide training. Cynthia Shelton asked about pricing. Glenn said pricing models will continue to be the same as they are in Passport or CatME.
Functional Requirements for Bibliographic Records (FRBR): What is it and what does it mean for our operations? (Glenn Patton)
The Functional Requirements for Bibliographic Records or FRBR (ISBN 3-598-11328-X) is the result of six years of work by an IFLA study group. The process started with an international conference in Stockholm in 1990 that looked at how the world of bibliographic data had changed in the last half century, including the growth of shared cataloging databases, the role of publishers and distributors in providing bibliographic data, the role of electronic publications, etc. This led to the recognition that these kinds of changes were stretching traditional practices of cataloging. While cataloging rules and practices had changed over years to accommodate new types of materials, this change had not been done in a principled way. The rules were adjusted to deal with new situations rather than by establishing general principles. The IFLA study group was to look at what we do when we catalog, what kinds of information we record, how necessary that information is, etc. and to build a conceptual model of how bibliographic data works so that it could be used as the basis for a more principled look at cataloging use. The study group identified four tasks that users of all types (including library staff) perform: 1) find entities that correspond to the user’s stated research criteria, 2) identify an entity, 3) select an entity appropriate to the user’s needs, and 4) acquire or obtain access to the entity described.
The FRBR model defines three groups of entities and describes the relationship among these groups of entities. The products of intellectual or artistic endeavor (works, expressions, manifestations, items); those responsible for the intellectual or artistic content (person or corporate body); and those that serve as the subjects of intellectual or artistic endeavor (concept, object, event, and place; persons and works can also be the subjects of works).
Past catalogs have tended to be flat sequences of individual items that did not show hierarchical relationships. The potential for the FRBR model is to assist in developing the relationships to enable users to deal with smaller result sets instead of having to deal with result sets of hundreds or thousands of bibliographic records.
Mr. Patton provided graphics which illustrated the four level FRBR model. The first was based on a single work, Shakespeare's Hamlet (a work) which was realized through various expressions (a French translation, and a German translation), each of which was embodied in various manifestations (Paris, 1946; Hamburg 1834), and exemplified by different items (physical copies in different libraries).
The second was for a set of related works, all based on the novel Show Boat by Edna Ferber, with a pun relating her last name to FRBR. There was the novel itself (work), with a Polish translation (expression) and a specific edition of that translation (manifestation). There was also the 1936 motion picture directed by James Whale (work); and also the 1951 motion picture directed by George Sidney (work); and the Kern-Hammerstein musical of 1927 (work) with the latter realized through various expressions (a score for the vocal selections, a recording of selections, and the original cast recording of the 1946 revival with each of these three expressions being embodied in various manifestations (different publishers and dates)).
Judi Nadler asked what, conceptually, is the difference between FRBR and MULVER (multiple versions project)? In terms of the origin of FRBR Glenn didn’t think there was a direct relationship though some of the same people were involved in both projects. MULVER was interested in reproductions. Work is going on in the local system vendor community, particularly in Europe (funded by the European Union), on ways to organize and display records for users; OCLC plans to incorporate FRBR into its new version. Judi Nadler commented that MULVER was highly regarded conceptually but fell apart because of the perceived difficulty of applying it. How is FRBR different? Glenn responded that there is lots of interest from the vendor community and the cataloging rules community in FRBR; perhaps now is the time and MULVER was before its time.
Larry Alford said FRBR sounded like a powerful idea and model but that the implications for local operations seem pretty big. He asked if Glenn had any ideas how we might do that? Glenn said he didn’t though there is beginning to be a general movement for introducing the library community to FRBR, pointing to various programs at this conference. He would like to hide it from catalogers as much as possible, letting systems provide links or options among which catalogers could choose. Cynthia Shelton said catalogers know this stuff intellectually; they know about collocating works which systems display by date or language. Glenn said that was true to some degree but the experience at OCLC shows that there are lots of things we haven’t done as much in the past as we could have, e.g., use of uniform titles. It would be hard to go back to supply this information for older things or even for some categories of things. We have been more likely to use uniform titles for literary works rather than scientific works.
Duane Arenales was curious as to what effect he thought the FRBR approach would have on MARC? Glenn said there is a discussion paper entitled “Dealing with FRBR expressions in MARC21” (http://lcweb.loc.gov/marc/marbi/2002/2002-dp08.html) which MARBI will consider at its meeting on June 15, 2002. He also mentioned Tom Delsey’s Functional Analysis of the MARC 21 Bibliographic and Holdings Formats (http://www.loc.gov/marc/marc-functional-analysis/) which contains an analysis of MARC in relation to the FRBR model. Do we need work and expression records in addition to manifestation records; can they be derived from existing manifestation records? [For a research project on this latter topic cf. Dr. Edward O'Neill's presentation at the CCS Cataloging and Classification Research Discussion Group on Saturday, June 15, 2002. Notes on this presentation can be found at http://www.acsu.buffalo.edu/~ulcjh/FRBRoneill.html.]
Larry Alford urged people to look at the white paper by Carol Hixson and Jean Hirons at http://www.loc.gov/catdir/pcc/whitepapertrng.html.
Larry Alford asked Bob Wolven to briefly summarize issues relating to e-resources for discussion at a future meeting
Mr. Wolven mentioned several developing approaches for e-resource cataloging, including new rules for integrating resources, use of e-journal management vendors to supply cataloging and holdings data, and a CONSER proposal for a new kind of one record approach to e-journals (http://lcweb.loc.gov/acq/conser/singleonline.html) which has some relationship to FRBR. He noted that these issues overlap with several items on the agenda, and concluded by asking several questions. Is the CONSER proposal a good thing or not? Are our cataloging staffing, training and organizational models well suited to these developing approaches? What implications do these cataloging models have for our ability to support preservation and archiving of e-resources?
State University of New York at Buffalo
Stephen Smith, head of the Rapid Cataloging Team at the University of Illinois Library at Urbana-Champaign, has been awarded the 2002 LITA/Library Hi Tech Award. This award, sponsored by EMERALD and administered by the Library and Information Technology Association (LITA), was established in 1992 to recognize “outstanding achievement in communicating to educate practitioners within the library field in library and information technology.” Professor Smith won the award for his interactive CD-ROM tutorial for cataloging librarians and support staff, entitled The IOUG Introductory Authority File Workshop. It is designed to instruct all levels of employees in the art and science of searching the OCLC Name Authority File as well as to assist library staff in the identification and verification of the proper forms of name and title headings.