https://ntrs.nasa.gov/search.jsp?R=20010089150 2019-01-12T19:57:40+00:00Z NASA / TM-2001-211049 Buckets: Smart Objects for Digital Libraries Michael L. Nelson Langley Research Center, Hampton, Virginia August 2001 The NASA STI Program Office ... in Profile Since its founding, NASA has been dedicated to CONFERENCE PUBLICATION. Collected the advancement of aeronautics and space papers from scientific and technical science. The NASA Scientific and Technical conferences, symposia, seminars, or other Information (STI) Program Office plays a key meetings sponsored or co-sponsored by part in helping NASA maintain this important NASA. role. SPECIAL PUBLICATION. Scientific, The NASA STI Program Office is operated by technical, or historical information from Langley Research Center, the lead center for NASA programs, projects, and missions, NASA's scientific and technical information. The often concerned with subjects having NASA STI Program Office provides access to the substantial public interest. NASA STI Database, the largest collection of aeronautical and space science STI in the world. TECHNICAL TRANSLATION. English- The Program Office is also NASA's institutional language translations of foreign scientific mechanism for disseminating the results of its and technical material pertinent to NASA's research and development activities. These mission. results are published by NASA in the NASA STI Report Series, which includes the following Specialized services that complement the STI report types: Program Office's diverse offerings include creating custom thesauri, building customized TECHNICAL PUBLICATION. Reports of databases, organizing and publishing research completed research or a major significant results ... even providing videos. phase of research that present the results of NASA programs and include extensive For more information about the NASA STI data or theoretical analysis. Includes Program Office, see the following: compilations of significant scientific and technical data and information deemed to • Access the NASA STI Program Home Page be of continuing reference value. NASA at http'//www.sti.nasa.gov counterpart of peer-reviewed formal professional papers, but having less • E-mail your question via the Internet to stringent limitations on manuscript length [email protected] and extent of graphic presentations. • Fax your question to the NASA STI Help TECHNICAL MEMORANDUM. Scientific Desk at (301) 621-0134 and technical findings that are preliminary or of specialized interest, e.g., quick release • Phone the NASA STI Help Desk at reports, working papers, and (301) 621-0390 bibliographies that contain minimal annotation. Does not contain extensive Write to: analysis. NASA STI Help Desk NASA Center for AeroSpace Information CONTRACTOR REPORT. Scientific and 7121 Standard Drive technical findings by NASA-sponsored Hanover, MD 21076-1320 contractors and grantees. NASA / TM-2001-211049 Buckets: Smart Objects for Digital Libraries Michael L. Nelson Langley Research Center, Hampton, Virginia National Aeronautics and Space Administration Langley Research Center Hampton, Virginia 23681-2199 August 2001 Available from: NASA Center for AeroSpace Information (CASI) National Technical Information Service (NTIS) 7121 Standard Drive 5285 Port Royal Road Hanover, MD 21076-1320 Springfield, VA 22161-2171 (301) 621-0390 (703) 605-6000 ABSTRACT BUCKETS: SMART OBJECTS FOR DIGITAL LIBRARIES Michael L. Nelson Old Dominion University, 2000 Director: Dr. Kurt Maly Current discussion of digital libraries (DLs) is often dominated by the merits of the respective storage, search and retrieval functionality of archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, the information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should be protected to the extent possible from evolving search engine technologies and vendor vagaries in database management systems. Information content and information retrieval systems should progress on independent paths and make limited assumptions about the status or capabilities of the other. Digital information can achieve independence from archives and DL systems through the use of buckets. Buckets are an aggregative, intelligent construct for publishing in DLs. Buckets allow the decoupling of information content from information storage and retrieval. Buckets exist within the Smart Objects and Dumb Archives model for DLs in that many of the functionalities and responsibilities traditionally associated with archives are "pushed down" (making the archives "dumber") into the buckets (making them "smarter"). Some of the responsibilities imbued to buckets are the enforcement of their terms and conditions, and maintenance and display of their contents. These additional responsibilities come at the cost of storage overhead and increased complexity for the archived objects. However, tools have been developed to manage the complexity, ii and storageis cheapand gettingcheaper;the potential benefitsbuckets offer DL applicationsappeartooutweightheircosts. Wedescribethemotivation,designandimplementationofbuckets,aswell asour experiencesdeployingbucketsintwoexperimentaDl Ls. Wealsointroducetwo modified formsof buckets:a"dumbarchive"(DA) andtheBucketCommunicationSpace(BCS). DA is aslightly modifiedbucketthatperformssimplesetmanagemenftunctions. The BCSprovidesawell-known locationfor bucketsto gainaccessto centralizedbucket services,suchas similarity matching,messagingandmetadataconversion. We also discussexperiencelsearnedfromusingbucketsin theNCSTRL+andUniversalPre-print Server(UPS)experimentadl igitallibraries. Weconcludewith comparisonsto related workanddiscussionaboutpossibleareasforfutureworkinvolvingbuckets. iii ACKNOWLEDGMENTS This dissertationwasmadepossiblethrough the assistance, encouragement and patience of many people. Foremost among these people are the members of my committee. Kurt Maly provided the direct advisement, insight and strategic vision necessary for the definition, refinement and wide adoption of the research results. Stewart Shen and Mohammad Zubair were constant supporters and more than occasionally devil's advocates during our weekly meetings. Frank Thames provided much of my initial motivation to pursue a Ph.D., and David Keyes' encouragement is the reason I chose to obtain it at Old Dominion University. Many fellow students at Old Dominion University have positively affected my research. Xiaoming Liu, Mohamed Kholief, Shanmuganand Naidu, Ajoy Ranga, and Hesham Anan are among those that have made design or coding suggestions, developed supporting technologies, and ferreted out many bugs. NASA Langley Research Center has provided me with the opportunity and resources to perform digital library research and development. These current and former NASA colleagues have provided technical, financial and moral support in the breadth of my digital library activities at Langley: David Bianco, Arleen Biser, David Cordner, Delwin Croom, Sandra Esler, Gretchen Gottlich, Nancy Kaplan, Mike Little, Ming- Hokng Maa, Mary McCaskill, Daniel Page, Steve Robbins, Joanne Rocker, George Roncaglia, and Melissa Tiffany. A number of people outside Old Dominion University and Langley Research Center played significant roles in supporting the development and adoption of buckets. Among these people are: Herbert Van de Sompel (University of Ghent), Marcia Dreier (Air Force Research Laboratory) and Rick Luce (Los Alamos National Laboratory). Finally, I would like to thank Rod Waid for the creation of the lovable "Phil" character that eventually evolved into our research group's mascot, and Danette Allen for her patience and support. iv TABLE OF CONTENTS PAGE LIST OF TABLES ................................................................................................. vii LIST OF FIGURES ............................................................................................. viii Chapter 1. INTRODUCTION ..................................................................... 1 2. MOTIVATION AND OBJECTIVES .............................................. 6 2.1 Why Digital Libraries? ...................................................... 6 2.1.1 Digital Libraries vs. the World Wide Web .................... 8 2.1.2 Digital Libraries vs. Relational Database Management Systems ............................................................ 9 2.2 Trends in Scientific and Technical Information Exchange ............... 10 2.3 Information Survivability ................................................... 13 2.4 Objectives and Design Goals ............................................... 15 2.4.1 Aggregation ........................................................ 15 2.4.2 Intelligence ......................................................... 16 2.4.3 Self-Sufficiency ................................................... 16 2.4.4 Mobility ........................................................... 17 2.4.5 Heterogeneity ..................................................... 18 2.4.6 Archive Independence ............................................ 18 3. BUCKET ARCHITECTURE ....................................................... 19 3.1 Overview ...................................................................... 19 3.2 Implementation .............................................................. 24 3.2.1 Bucket Methods .................................................. 26 3.2.2 File Structure ..................................................... 30 3.2.3 Terms and Conditions ........................................... 33 3.2.4 Internal Bucket Operation ....................................... 38 3.2.5 Metadata Extensions ............................................. 39 3.3 Discussion ..................................................................... 41 3.3.1 Bucket Preferences ............................................... 41 3.3.2 Systems Issues ................................................... 43 4. DUMB ARCHIVES .................................................................. 47 4.1 Overview ...................................................................... 47 4.1.1 The SODA DL Model .......................................... 47 4.1.2 Archive Design Space ............................................ 49 4.1.3 Publishing in the SODA Model ................................ 50 4.2 Implementation ............................................................... 51 V PAGE 4.2.1 Implemented Methods .......................................... 52 4.2.2 Changes from a Regular Bucket ................................. 52 4.3 Discussion ..................................................................... 53 4.3.1 DA Examples ..................................................... 53 4.3.2 DBM Implementation Notes ................................... 56 4.3.3 Open Archives Initiative Dienst Subset Mapping ........... 57 5. BUCKET COMMUNICATION SPACE .......................................... 60 5.1 Overview ...................................................................... 60 5.1.1 File Format Conversion ......................................... 61 5.1.2 Metadata Conversion ............................................ 61 5.1.3 Bucket Messaging ................................................ 62 5.1.4 Bucket Matching ................................................. 62 5.2 Implementation .............................................................. 63 5.2.1 Implemented Methods .......................................... 63 5.2.2 Changes from a Regular Bucket ................................ 68 5.3 Discussion .................................................................... 69 5.3.1 Performance Considerations .................................... 69 5.3.2 Current Limitations .............................................. 72 6. BUCKET TESTBEDS ............................................................... 74 6.1 NCSTRL+ .................................................................... 74 6.1.1 Dienst .............................................................. 74 6.1.2 Clusters ............................................................ 76 6.2 Universal Preprint Server ................................................... 77 6.2.1 Lightweight Buckets ............................................. 78 6.2.2 SFX Reference Linking in Buckets ............................. 79 7. RELATED WORK .................................................................... 83 7.1 Aggregation ................................................................... 83 7.1.1 Kahn/Wilensky Framework and Derivatives ................. 83 7.1.2 Multivalent Documents ......................................... 83 7.1.3 Open Doc and OLE .............................................. 84 7.1.4 Metaphoria ........................................................ 84 7.1.5 VERS Encapsulated Objects .................................... 84 7.1.6 Aurora ............................................................. 85 7.1.7 Electronic Commerce ............................................ 85 7.1.8 Filesystems and File Formats .................................. 86 7.2 Intelligence ................................................................... 86 7.3 Archives ...................................................................... 87 vi PAGE 7.4BucketTools................................................................. 88 8.FUTUREWORK ..................................................................... 92 8.1AlternateImplementations................................................ 92 8.1.1Buckets............................................................ 92 8.1.2DumbArchives................................................... 93 8.1.3BucketCommunicationSpace.................................. 93 8.2ExtendedFunctionality..................................................... 93 8.2.1Pre-definedPackagesandElements............................. 94 8.2.2XML Metadata................................................... 94 8.2.3MoreIntelligence................................................. 95 8.3Security,AuthenticationandTerms& Conditions ..................... 95 8.4NewApplications........................................................... 97 8.4.1Discipline-SpecificBuckets..................................... 97 8.4.2UsageAnalysis................................................... 97 8.4.3SoftwareReuse................................................... 98 9.RESULTSAND CONCLUSIONS.................................................. 99 REFERENCES............................................................................ 102 APPENDICES A. BUCKETVERSIONHISTORY............................................... 114 B.BUCKETAPI ..................................................................... 118 C.DA API ............................................................................. 150 D.BCSAPI ........................................................................... 156
Description: