JOURNAL OF ORGANIZATIONAL COMPUTING 6(1), 97-107 (1996) Introduction to Network Publishing Brewster Kahle Richard Koman Wide Area Information Servers, Inc. Network publishing is the integration of computer networks and traditional pub- lishing that creates a basis for a new mechanism for organized information sharing. Computer networks can now support interactive text applications across many countries. Publishers have been exploiting computer technology to speed printed publications to market. Using computer networks for the distribution of work takes this trend to the next logical step. Based on the experience of the WAISTM, World Wide Web, and Gopher systems on the Internet, this article will propose the tech- nical rationale for network publishing and suggest some of the components of a suc- cessful commercial system. electronic publishing, networking, client-server computing 1. INTRODUCTION Very few generations see a change in how people communicate with each other. When a new communications technology develops, all sorts of things change: industries form, groups of companies shift, methods of learning and sharing information change. With the development of the printing press in the late fifteenth century, languages became regularized, people tapped the knowledge of the ancients, and, more importantly, authors were able to spread their words far and wide. New types of writing flowered, and literature was born. More recently, the telephone connected people in distant locations and allowed for the physical separation of offices and factories. Although only businesses had access to this technology in the late nineteenth century, by the 1930s even rural homes were connected. A similar ground-shifting technology is network publishing, the com- munication and distribution of information over wires. Network publishing has great potential to change--and improve--the flow of information. Network publishing opens doors for widespread access to all kinds of information, for new breeds of publishers, and for new business opportunities for traditional publishers. Correspondence and requests for reprints should be sent to Brewster Kahle, Wide Area Infor- mation Servers, Inc., 1056 Noe Street, San Francisco, CA 94114. E-mail: <[email protected], [email protected]>. 97 98 KAHLE & KOMAN INTRODUCTION TO NETWORK PUBLISHING 99 Network publishing allows for inexpensive reproduction, targeted transmis- Although phone lines are used to transmit fax pages, the slow speed means that sion, and distributed control. These are the goals of the Wide Area Information users tend to receive these documents in the background for later reading. ServersTM system (WAIS). As the phrase suggests, network publishing comes Internet speeds, on the other hand, are 7 to 40 times faster. This is fast enough out of the idea of a convergence of publishing and networks. The publishing for users to browse, search, and scan text and business graphics. As the speeds industry is following a trend toward computerization, in which all elements of go up, color graphics, audio, and video will be practical, but for now, network production are digital until the work is printed. Computer networks, notably the publishing is centered on text and simple graphics. Internet, are now fast, inexpensive, and highly distributed. This convergence of With an inexpensive system that is fast enough, the remaining question is: content and distribution lays the stage for network publishing. In the following Does the network connect enough users? A recent study indicated that some 20 sections, we will explore this convergence in more depth. million people use Internet e-mail, and over a million users are interactively con- nected, with usage doubling every 9 months.2 Whereas the academic and research 1.1 Publishing: Computerized Production communities make up the current base of Internet users, the greatest growth is in the international and commercial sectors.3 For publishers that want to address The publishing industry has undergone a series of technology shifts, most of the markets connected to the Internet, that is enough coverage to support the first which have come from the application of computer technology to such tasks as businesses. typesetting. The desktop publishing revolution--based on Apple Macintosh Thus, the Internet is a viable distribution model for network publishing, and computers, the Adobe PostScript page description language, and graphics soft- it (or other similar networks) will likely prosper. The history of the Internet-- ware--accelerated the process. To understand the impact of this (and how it from Defense Department research project to education-and-research network to relates to network publishing), consider how the print publishing process has commercial backbone--is the story of the successful migration of a scalable changed. In any publishing operation, there are several pieces that must be technology. integrated to create a page that can be printed. The content must be written, the pages must be designed, the words must be typeset, black and white photo- 1.3 New Kinds of Publishers graphs must be converted to halftones, color, photographs must be separated into sets of film. Then all the pieces are put together, film is shot of each page, Network publishing will turn many kinds of organizations into international the films are stripped together in a signature, a plate is made from the film, and publishers. Everyone who is in the business of providing information to an finally the piece is printed. audience, whether for free or for some payment, may become a network pub- The Macintosh and PostScript changed all that forever; entire industries lisher: government agencies, corporations, libraries, individual writers and were replaced in the process. With authors writing on computers and designers artists, as well as publishers of magazines, newspapers, and books. The first creating pages on screen, typesetting became inseparable from design. It was wave of network publishers--and this is already happening--is comprised of simply sucked into the machine. As the technology improved, more and more organizations who are looking for less-expensive ways of distributing free infor- of these processes--traditionally performed by skilled tradesmen--were pulled mation. These include corporations, government agencies, university libraries, into the computer as well. With current technology, all "prepress" functions and catalog publishers. Sun Microsystems is an example of this kind of network can be done on desktop computers. But eventually the work is output to film for publisher. Sun uses the Internet to distribute technical marketing materials at a stripping, platemaking, and printing on paper.1 much lower cost and greater timeliness than could be done by printing and mail- ing these materials. 1.2 Computer Networks: Fast, Cheap, Distributed A second wave is comprised of newspaper, magazine, journal, and book publishers who are "republishing" their content for networks. Dow Jones and Now that documents are created and stored on computers, a natural step is to Encyclopaedia Britannica are two "traditional" publishers intent on making distribute them in digital form. This requires a reliable digital transport that is money by network publishing. Because traditional publishers already have in inexpensive enough, fast enough, and widespread enough to support text and place a system for gathering, editing, and presenting information, publishers graphics. The Internet has recently achieved these goals, and it is starting to be can easily "repurpose" data collected for the primary business function. Although used as a distribution mechanism for network publishers. The cost of digital many publishers are using online services like CompuServe and America Online communications has dropped, reflecting both the drop in cost in terminal equip- to publish electronically, these companies are publishing directly on the Internet ment and the rise in demand. Where text and graphics can now be shipped over in order to maintain the profitability of their network publishing business. "The computer networks, audio and video transmissions are not yet cost-effective. main reason we are doing it ourselves is that you just can’t make any money ~ There are many positive things about printed communications, but one problem is cost. Internet Society, CNRI, Reston, Virginia. Besides the cost of the substrate, paper must be warehoused, shipped, marled, etc. Ibid. 100 KAHLE & KOMAN licensing your content," Joseph J. Esposito, president of Encyclopaedia Britannica newspaper production, in which a staff of reporters and editors filter and inter- North America said in a New York Times article on the Britannica service [1]. "If pret information, present it in an appealing way on a page, and the daily paper you do believe that content is king, it’s rather unfortunate, that so many of the is delivered to subscribers’ doorsteps. WAIS provides an infrastructure for agent- content providers have put themselves in a position where they’re held hostage ing technology, which will deliver this ability to passively receive new informa- to the online services." tion. See the Agents section below for a further discussion of agents. The third wave will occur when works are created directly for this interactive environment. People will take advantage of the fact that anyone can be a pub- 2.4 Information Integration lisher: all that is needed is a computer, a telephone, and something to say. In- dividuals will share ideas and work with others in a way not possible before. Users will require seamless access to all information--personal, corporate, and Network publishers will be able to find an audience and readers will be able to published--ideally with the same tools. This breadth of information will probably find compelling documents. not be in one library or database but rather include one’s own files, enterprise databases, and many outside sources. Searching must be easy and intuitive even though the mountains to search through are large and unorganized. 2. REQUIREMENTS OF NETWORK PUBLISHING These four processes address users’ needs that are now primarily handled in print and telephone communications. Information--in the form of newspapers, magazines, television, and telephone services--is flooding into people’s lives. Yet discussion of the information super- 2.5 How Does Network Publishing Deliver these Goals? highway causes many people to say, "I can’t deal with the information I get now. What do I want with more information?" This "information anxiety" The goal is to combine the right network tools into a coherent whole that can be signals that navigation and information tools are not good enough yet. Readers used over a wide area by millions of users. Moving through gigabytes of informa- need to be able to search and find the information they need without being tion efficiently requires advanced PC products, digital networks, and a myriad exposed to information they don’t need, to browse and explore other information of information sources to choose from. One system that is rapidly evolving when they have the time and inclination, and stay up-to-date by having new toward that goal is WAIS. The pieces that make this possible are: information delivered. Finally, they will want all of these information retrieval techniques integrated into a single interface. These are the goals of WAIS as a Easy information navigation network publishing system: easy and efficient navigation, the development of a Client-server architecture wide and varied, community of users, and the ability to publish both free and for- Agent technology pay information. Security measures. 2.1 Searching 3. INFORMATION SERVER AS A TOOL FOR FINDING INFORMATION The primary tool in network publishing is the ability to ask for information on a topic and quickly get a response of relevant documents. The methods should be The primary requirement of a network publishing system is that it let users find intuitive, familiar, easy, and fast. This is currently available with WAIS. the most relevant references out of a collection of 10,000 to one million docu- ments. For this we need "information servers" --smart servers that store large 2.2 Browsing amount of information--from 1 to 100 gigabytes--and support thousands of users. An information server needs as many clues as possible as to what the user Playing around in the information is crucial because it lets us understand the likes and dislikes and should give as much feedback to the user about what it breadth of the information. Browsing, shopping, and exploring are required contains and what is can be used for. before we even know what we can search for. In this way, we will find new Typically, users approach a database in three ways. First they may want to topics that we didn’t know were interesting before. Browsing Internet resources browse through the database, learning about its organization and scope. Then is currently available through the Gopher and World Wide Web systems. they will tend to search for information on several different topics. Finally, they will narrow to a very specific search. There are three principal ways that servers 2.3 Updating provide the searching capability: natural language searchers, relevance feed- back, and Boolean searches.4 Staying up-to-date with our current interests can be difficult unless a steady 4 Some commercial Boolean systems are Dialog and Mead Data. Boolean and natural language stream of filtered information is automatically presented. If this is not done well systems are available from Fulcrum, Personal Librarian Software, Conquest. Systems that include -enough, then we just won’t find the time to read it. This process is similar to relevance feedback are available from Thinking Machines Corporation Verity, WAIS Inc. 102 KAHLE & KOMAN INTRODUCTION TO NETWORK PUBLISHING 103 3.1 Natural Language the local computer controls the user experience, and the server provides fixed services. By contrast, in a remote windowing system, the server controls the Writing a query should be no more difficult than asking a question. Natural entire user experience. Examples of this are America Online and Mosaic. language lets the use say, "I’m looking for this" or "What do you know about that?" Natural language queries can include any number of extraneous words, 4.2 Ease of Use can be in question or statement form, and require no special syntax, case sensi- tivity, or mathematical symbols. This is the beginning of the dialog, a broad Applications using icons and menus have been shown to be 35% faster to use question answered by a number of possibly relevant documents. than a similar character-based program. GUI users were less fatigued and were found to explore and teach themselves the capabilities of the application [3]. 3.2 Relevance Feedback 4.3 Integration with the Desktop The conversation continues with relevance feedback when the user picks one of the returned documents, or a section of one document, and says, "I like that Client-server allows users to bring external information into other local pro- one--find me more like that one" [2]. This is a powerful mechanism for moving grams such as word processors, spreadsheets, and image editors. It is also pos- through large collections of information by finding documents that are "linked" sible to add searching functions directly into these programs, so, for instance, a to the current one. Rather than static hypertext links, however, these links can writer could search for a specific document, download it, display it, and edit it, be dynamically created based on what the user has liked in the past. all without leaving the word-processing program. Or a designer could search the network for an appropriate stock photo and add it to the design, all within one 3.3 Boolean and Fielded Searches page-layout program. When the user has narrowed in on the desired information, it is often fruitful to 4.4 The Future of Client-Server: Agent Technology do very specific searches using Boolean and fielded search parameters. A certain amount of training and sophistication is required to make good use of these In the future, client-server technology will be exploited to develop agents that techniques, but they can be quite powerful. will serve as alter egos. Agents will be able to ponder indications of a user’s pref- erences and act accordingly. A user’s computer will know what its owner reads 3.4 Future Searching Technologies and doesn’t read, to whom messages are sent, and whose messages are ignored.5 Automating some of the information-collection tasks can help find relevant Future searching systems will handle multimedia, so that users can search on information from thousands to tens of thousands of sources. Given the power of pictures as well as text, handle multiple languages, and learn from past user desktop machines and a protocol that allows for machines automation, we have feedback. the pieces needed to create these searching automatons. On the Internet, there are literally thousands of information servers, so a system of robots would be useful. 4. CLIENT-SERVER TECHNOLOGY: Although the word "agents" suggests a human capability similar to a secre- THE POWER OF THE DESKTOP tary or research assistant, the current technology is in its infancy. The precursors are present, however: a growing body of quality information, computer-to-com- Client-server frees users from the shackles of the mainframe. They can interact puter protocols that can support agents, multitasking operating systems on the with servers using desktop computers, laptops, "personal digital assistants," desktop, digital networks, and, most importantly, a discerning user population. and, maybe someday, home game machines. Client-server technology puts the Today experimental agents are starting to perform the following tasks: control in the user’s hands, by exploiting the power of the user’s computer to provide more functionality and efficiency. The chief advantages of client-server ¯ Ask many servers a question on behalf of a user and track the user’s actions for network publishing are graphical user interfaces (GUIs), integration with in response to the answers. other applications, and advanced display modes. ¯ Scour the world (within a budget) to find new sources. ¯ Work 24 hours a day finding information. 4.1 Graphical User Interfaces s Commercial systems include Apple’s Rosebud project which became AppleSearch (Apple Graphical user interfaces typically feature icons and windows, thus hiding com- Computer, Cupertino, California), and Relevant Personal Digital Newspaper by Ensemble Inc., plexity and increasing ease-of-use and efficiency. In a client-server environment, Menlo Park, California. 104 KAHLE & KOMAN INTRODUCTION TO NETWORK PUBLISHING 105 ¯ Format "personal newspapers" for users to read off-line on portable 6. SECURITY IN A WORLD-WIDE NETWORK machines. ¯ Gossip with other clients to share information. Security systems in a network publishing system restrict unwanted access to documents based on the user’s identity. Users ask the server for data, and they Organizations like Xerox’s Palo Alto Research Center, Massachusetts Insti- are allowed or refused access according to the commands of the information pro- tute of Technology, General Magic, and Ensemble are working to make these vider. Unlike services that "broadcast" files (like NetNews or CD-ROM distribu- capabilities available in the next few years. By the time users start to use these tion), in a network publishing system, documents are only copied when a user automating processes, they may not even be aware of them. requests them. Thus, the publisher controls the distribution of the work. The primary security concerns in a network publishing environment are privacy, theft, and viruses. 5. NAV.IGATING A SEA OF SERVERS: UBIQUITOUS PROTOCOLS Although client-server has significant value within an organizational local area 6.1 Privacy network (LAN), it really shines in a wide-area network like the Internet. When a good protocol is in place, network users can access multiple servers in a single Network publishing brings up new issues of privacy because most users are search, talk to many different kinds of servers in the same way, access personal, unaware that their actions can be recorded. For instance, WAIS, Gopher, and organizational, and published (wide-area) information in an integrated fashion, World Wide Web al! generate usage logs for the server. These logs tell the server use sophisticated clients, and use the network as a reference source with such administrators who searched for what information and what files were down- tools as a directory of servers.6 To provide all of this, the protocol must be flex- loaded. Whereas these logs provide valuable information that helps information ible, extensible, standard, and, of course, good enough. providers improve their services, it is also possible that information could be sold to third parties. Encryption can protect information during transmission by 5.1 Flexible encoding messages so they can only be read by the intended recipient. But this is only part of the answer. The network publishing community also needs to The protocol should operate on all computers--from desktop personal computers develop rules of conduct for handling this information. to supercomputers. It should allow for searching of many data types--not only text but also maps, DNA structures, and other unusual data types. The protocol should support any search syntax. Finally, it should allow clients to gossip with 6.2 Theft one another about their discoveries. In this context, theft refers primarily to improperly reselling information owned 5.2 Extensible by someone else, or giving away copies of something that is being sold. Network publishing systems are attempting to make it easy for users to act legally by mak- An extensible protocol can grow and add new features without going through a ing "pointers" to the original data. Someone who wanted to include an article long standardization process. from a for-pay online magazine could simply construct a pointer that would bring the user to the site of the legitimate publisher of that document. 5.3 Standard Another concern is that someone who had not paid for access to a for-pay server would be able to break in, thus depriving the provider of income. This The protocol should be based on nonproprietary, international standards, so possibility is effectively handled by authentication procedures. companies but still be interoperable. 5.4 Good Enough 6.3 Viruses and Security Breaches To be good enough, the protocol must be able to handle the current set of needs In the WAIS system, users do not actually log in to the server; rather; they search and be able to retrieve any kind of data, including text, graphics, sound, and through a read-only application layer protocol (Z39.507). Thus, there is no risk of video. information on the server being modified or of the server being contaminated by 6 Protocol committees that are working on relevant network standards include Internet Engi- viruses. neering Task Force, National Information Standards Organization, International Standards Organi- zation, OSI Working Group on Library Applications. Document formats standards are set by other 7 Z39.50-1992 Information Retrieval Service and Protocol (ANSI/NISO). groups and companies. 106 KAHLE & KOMAN INTRODUCTION TO NETWORK PUBLISHING 107 6.4 Current Security Techniques How the customer will pay for information access is another open question. End-users might be billed for single subscriptions, but more likely connectivity Securing information involves a balance of ease-of-use and protection. It would providers (such as regional Internet providers and online services) will act as be prohibitively difficult for a user to remember different passwords for every middlemen to centralize billing. server contacted; on the other hand, giving a user a single password for many information servers would invite abuses. Hardware encryption devices, like the proposed Clipper, are far more difficult to distribute than a software solution. 8. CONCLUSION Older security systems that allow two systems to communicate because they each know the same secret key are difficult to extend to a system where there are Weaving the network publishing elements together to make a ~sable system is thousands of servers and millions of users. Two new systems have been developed the goal of WAIS. By incorporating a large number of services and users and an to address these problems--public key and Kerberos--and both are starting to be open protocol for future growth and compensation mechanisms such a system used in the WAIS environment. can grow. So far the system has been useful on the Internet for search and Public key techno!ogy offers all the right pieces: privacy, scalability, authen- retrieval, and WAIS resources have been blended into many other systems such tication, and digital signatures. The catch is it requires licensing from a private as Gopher, World Wide Web, e-mail services, and others. By January 1994, other company. This has not stopped implementation, but it has slowed dissemina- 100,000 users used WAIS and the number continues to grow. tion. Kerberos, from MIT, does not require licensing, but requires a hierarchy of Network publishing is not about saving trees or replacing books. It is about authorities to validate connections. All in all, the public key and Kerberos tech- new relationships between publishers and readers, a fundamental shift in the nologies offer strong security measures for network publishing, but the dissemi- way people obtain information, and new forms of literature that will spring from nation of the infrastructure will take a while. a people unleashed to create and publish in an inexpensive new medium. 7. BILLING AND PAYMENT MODELS REFERENCES Making it possible for small producers, as well as large, to be compensated for [1] J. Markoff, "Britannica’s 44 million words are going on line," New York Times, Feb. 8, 1994. their work offers opportunity for continued growth in the Internet. Every com- [2] G. Salton and M. McGill, Introduction to Modern h~formation Retrieval. New York: McGraw-Hill, pany, every academic department, every family should be able to publish on this 1983. [3] D.L. Davidson, The Benefits of the Graphical User Interface. Temple, Barker and Sloane, 1990. network. Some will want to be compensated. Facilitating this future industry is one of the goals of WAIS system. Collecting the customer usage information is technically not difficult, but many questions about how the business will develop remain unanswered. There are many possible billing structures: subscription, site-license subscriptions, pay-per-article, advertising-supported, and many others. Although print publish- ing broke up into separate industries--writing, publishing, printing, distributing, and retailing--the network publishing business might evolve differently. In the current stage, the goal is to reach a critical mass of quality services that readers are willing to pay for. Current businesses that are making money on the Internet include connectiv- ity providers, hardware providers, telecommunications companies, book publish- ers, and consultants. After the plumbing is done, then interactive information services such as magazines, games, and performance events can proceed on the networks. Some Internet information service providers (such as WAIS, Bunyip, and Pandora) are just starting to make money. These businesses typically take con- tent from a publisher and retarget it for the networks. Some publishers (such as Encyclopaedia Britannica) operate the systems themselves, but niche service bureaus offer expertise and economies of scale.