ebook img

Mini-indexes for literate programs PDF

15 Pages·0.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mini-indexes for literate programs

Mini-Indexes for Literate Programs DonaldE.Knuth 4 9 ComputerScienceDepartment,StanfordUniversity,Stanford,CA94305-2140USA 9 1 n Abstract. Thispaperdescribeshowtoimplementadocumenta- a tiontechniquethathelpsreaderstounderstandlargeprogramsor J collectionsofprograms, byprovidinglocalindexestoalliden- 1 tifiers that are visible on every two-page spread. A detailed exampleisgivenforaprogramthatfindsallHamiltoniancircuits ] inanundirectedgraph. L P . s c [ Keywords: Literate programming, WEB, CWEB, C++, CTWILL, TEX,indexes,hypertext,Hamiltoniancircuits 1 v 2 0 1 1. Introduction 1 0 UsersofsystemslikeWEB[2],whichprovidesupportfor 4 9 structureddocumentationandliterateprogramming[5], / automatically get a printed index at the end of their s c programs,showingwhereeachidentifierisdefinedand : used. Such indexescan be extremelyhelpful, butthey v i canalsobecumbersome,especiallywhentheprogram X is long. An extreme example is provided by the list- r ing of TEX [3], where the index contains 32 pages of a detailedentriesinsmallprint. Readers of [3] can still find their way around the programquickly,however,because ...the right-hand pages of this book contain mini-indexesthatwillmakeitunnecessaryfor youtolookatthebigindexveryoften. Every identifierthatis usedsomewhereonapairof facingpagesislistedinafootnoteontheright- hand page, unless it is explicitly defined or declaredsomewhereontheleft-handorright- hand page you are reading. These footnote entriestellyouwhethertheidentifierisapro- cedureoramacrooraboolean,etc. [3] A similar idea is sometimes used in editions of liter- ary texts for foreign language students, where mini- dictionariesofunusualwordsappearoneachpage[10]; 1 thissavesthestudentfromspendingalotoftimesearch- ingbigdictionaries. Theideaofmini-indexeswasfirstsuggestedtothe author by Joe Weening, who prepared a brief mockup ofwhathethoughtmightbepossible[11]. Hisproposal was immediately appealing, so the author decided to implement it in a personal program called TWILL—a namesuggestedbythefactthatitwasatwo-passvariant ofthestandardprogramcalledWEAVE. TWILLwasused in September, 1985, to produce [3] and a companion book[4]. The original WEB system was a combination of TEX and Pascal. But the author’s favorite program- minglanguagenowadaysis CWEB[6], whichcombines TEX with C. (In fact, CWEB version 3.0 is fully com- patible with C++, although the author usually restricts himself to a personal subset that might be called C--.) OneoftheadvantagesofCWEBisthatitsupportscollec- tions of small program modules and libraries that can be combinedin manyways. A single CWEBsource file foo.wcangenerateseveraloutputfilesinadditiontothe Cprogramfoo.c;forexample,foo.wmightgeneratea headerfilefoo.hforusebyothermodulesthatwillbe loadedwiththeobjectcodefoo.o,anditmightgenerate atestprogramtestfoo.cthathelpsverifyportability. CWEBwasusedtocreatetheStanfordGraphBase,a collectionofaboutthreedozenpublic-domainprograms useful for the study of combinatorial algorithms [9]. These programs have recently been published in book form, again with mini-indexes [7]. The mini-indexes inthiscasewerepreparedwithCTWILL[8],atwo-pass variantofCWEAVE. The purpose of this paper is to explain the op- erations of TWILL and of its descendant, CTWILL. The conceptsareeasiesttounderstandwhentheyarerelated toadetailedexample,soacompleteCWEBprogramhas beenpreparedforillustrativepurposes. Section2ofthis paperexplainsthe exampleprogram; Sections 3 and4 explainhowCTWILLandTEXprocessit;andSection5 containsconcludingcomments. 2 2. An example TheCWEBprogramforwhichsamplemini-indexeshave been prepared especially for this paper is called HAM. It enumerates all Hamiltonian circuits of a graph, that is,allundirectedcyclesthatincludeeachvertexexactly once. For example, the program can determine that thereareexactly9862knight’stoursona6×6chess- board, ignoring symmetries of the board, in about 2.3 seconds on a SPARCstation 2. Since HAM may be in- teresting in its own right, it is presented in its entirety assortofa“sideshow”intheright-handcolumnsofthe pagesofthisarticleandonthefinal(left-hand)page. PleasetakeaquicklookatHAMnow,beforeread- ingfurther. Theprogramappearsinfivecolumns,each ofwhichwillbecalledaspreadbecauseitisanalogous to the two-page spreads in [3] and [7]. This arrange- mentgivesusfivemini-indexestolookatinsteadofjust two, so it makes HAM a decent example in spite of its relativelysmallsize. Ashorterprogramwouldn’tneed much of an index at all; a longer program would take toolongtoread. HAMisintendedforusewiththelibraryofroutines that comes with the Stanford GraphBase, so §1 of the programtellstheCpreprocessortoincludeheaderfiles gb_graph.handgb_save.h. Theseheaderfilesdefine the external functions and data types needed from the GraphBaselibrary. A brief introduction to GraphBase data structures will suffice for the interested reader to understand the fulldetailsofHAM.A graphis representedbycombin- ingthreekindsofstructrecordscalledGraph,Vertex, and Arc. If v points to a Vertex record, v~name is a string that names the vertex represented by v, and v~arcs points to the representation of the first arc em- anatingfrom that vertex.1 Ifa pointsto an Arcrecord that represents an arc from some vertex v to another vertexu,thena~tip pointstotheVertexrecordthatrep- resentsu;alsoa~next pointstotherepresentationofthe next arc from v, or a~next = L (i.e., NULL) if a is the last arc from v. Thus the following loop will printthe namesofallverticesadjacenttov: for(a=v~arcs; a; a=a~next) printf("%s\n",a~tip~name); An undirectededge between vertices u and v is repre- sentedbytwoarcs,onefromutovandonefromvtou. Finally, if g points to a Graph record, then g~n is the numberofverticesintheassociatedgraph,andtheVer- tex records representingthose vertices are in locations g~vertices +k,for0≤k<g~n. 1 ‘v~name’ is actually typed ‘v->name’ in a C or CWEB program; typographic sugar makes the program easiertoreadinprint. 3 A Vertex record also contains “utility fields” that can be exploited in different ways by different algo- rithms. TheactualCdeclarationsofthesefields,quoted from §8 and §9 of the program GB GRAPH [7], are as follows: typedefunion{ structvertex struct ∗V; /∗ pointertoVertex ∗/ structarc struct ∗A; /∗ pointertoArc ∗/ structgraph struct ∗G; /∗ pointertoGraph ∗/ char∗S; /∗ pointertostring ∗/ longI; /∗ integer ∗/ }util; typedefstructvertex struct { structarc struct ∗arcs; /∗ linkedlistofarcsoutofthisvertex ∗/ char∗name; /∗ stringidentifyingthisvertexsymbolically ∗/ utilu,v,w,x,y,z; /∗ multipurposefields ∗/ }Vertex; ProgramHAMusesthefirstfourutilityfieldsinor- der to do its word efficiently. Field u, for example, is treated as a long integerrepresentingthe degreeofthe vertex. Notice the definition of deg as a macro in §2; thismakesitpossibletorefertothedegreeofvasv~deg insteadofthemorecryptic‘v~u.I’actuallyseenbythe Ccompiler. Similarmacrosforutilityfieldsv,w,andx canbefoundin§4and§6. Thefirstmini-indexofHAM,whichcanbeseenbe- low §2 in the first columnofthe program,givescross- referencestoallidentifiersthatappearin§1or§2butare not defined there. For example, restore graph is men- tioned in one of the comments of §1; the mini-index tellsusthatitisafunction,thatitreturnsavalueoftype Graph∗,andthatitisdefinedin§4ofanotherCWEBpro- gramcalledGB SAVE.Themini-indexalsomentionsthat Vertex and arcs are defined in §9 of GB GRAPH (from which we quoted the relevant definitions above), and that fields next and tip of Arc records are defined in GB GRAPH§10,etc. One subtlety of this first mini-index is the entry for u, which tells us that u is a utility field defined in GB GRAPH§9. Theidentifieruactuallyappearstwicein §2,onceinthedefinitionofdeg andonceasavariable of type Vertex ∗. The mini-index refers only to the former,becausethelatterusageisdefinedin§2. Mini- indexes don’t mention identifiers defined within their ownspread. 4 Thesecondmini-index,below§5ofHAM,issimilar to thefirst. Noticethatitcontainstwoseparateentries for v, because the identifier v is used in two senses— both as a utility field (in the definition of taken) and as a variable (elsewhere). The C compiler will under- stand how to deal with constructions like ‘v~v.I = 0’, whichtheCpreprocessorexpandsfrom‘v~taken = 0’, buthumanreadersaresparedsuchtrouble. Noticetheentryfordeg inthissecondmini-index: It uses an equals sign instead of a colon, indicating that deg is a macro rather than a variable. A simi- lar notation was used in the first mini-index for cross- referencestotypedef’didentifierslikeVertex. Seealso the entry for not taken in the fourth mini-index: Here ‘not taken = macro ()’ indicates that not taken is a macrowitharguments. 3. The operation of CCTTWWIILLLL It would be nice to report that the program CTWILL producesthemini-indexesforHAMinacompletelyau- tomaticfashion,justasCWEAVEautomaticallyproduces ordinaryindexes. Butthatwouldbealie. The truthis thatCTWILLonlydoesabout99%oftheworkautomat- ically;theuserhastohelpitwiththehardparts. Whyisthisso? Well,inthefirstplace,CTWILLisn’t smartenoughtofigureoutthatthe‘u’inthedefinition of deg in §2 is not the same as the ‘u’ declared to be register Vertex ∗ in that same section. Indeed, a high degreeofartificialintelligencewouldberequiredbefore CTWILLcoulddeducethat. Inthesecondplace,CTWILLhasnoideawhatmini- indexentry to make forthe identifier k that appearsin §6. Novariablek is declaredanywhere! Indeed,users who write commentsinvolvingexpressionslike ‘f(x)’ mightormightnotbereferringtoidentifiersf and/orx in their programs; they must tell CTWILL when they are making“throwaway”referencesthatshouldnotbe indexed. CWEAVEdoesn’t have this problem because it indexesonlythedefinitions,nottheuses,ofsingle-letter identifiers. In the third place, CTWILL will not recognize au- tomatically that the vert parameterin the definition of not taken, §4, has no connection with the vert macro definedin§6. Afourthcomplication,whichdoesnotariseinHAM butdoesoccurin[3]and[7],isthatsectionsofaWEBor CWEBprogramcanbeusedmorethanonce. Thereforea singleidentifiermightactuallyrefertoseveraldifferent variables simultaneously. (See, for example, §652 in [3].) In general, when an identifier is defined or de- claredexactlyonce,andusedonlyinconnectionwithits 5 unique definition, CTWILL will have no problems with it. Butwhenanidentifierhasmorethanoneimplicitor explicit definition, CTWILL can only guess which defi- nitionwasmeant. Someidentifiers—especiallysingle- letter ones like x and y—are too useful to be confined toasinglesignificancethroughoutalargecollectionof programs. ThereforeCTWILLwasdesignedtoletusers providehintseasilywhenchoicesneedtobemade. The most important aspect of this design was to make CTWILL’sdefault actions easily predictable. The more “intelligence” we try to build into a system, the harder it is for us to control it. Therefore CTWILL has very simple rules for deciding what to put in mini- indexes. Each identifier has a unique current meaning, which consists of three parts: its type, and the pro- gram name and section number where it was defined. Atthebeginningofarun,CTWILLreadsanumberoffiles thatdefinetheinitialcurrentmeanings. Then,whenever CTWILL sees a C construction that implies a change of meaning—amacrodefinition, a variabledeclaration,a typedef, a function declaration, or the appearance of a label followed by a colon—it assigns a new current meaning as specified by the semantics of C. For ex- ample, when CTWILL sees ‘Graph ∗g’ in §2 of HAM, it changes the current meaning of g to ‘Graph ∗, §2’. ThesechangesoccurintheorderoftheCWEBsourcefile, not in the “tangled” order that is actually presented to theCcompiler. ThereforeCTWILLmakesnoattemptto nestdefinitionsaccordingtoblockstructure;everything it does is purely sequential. A variable declared in §5 and §10 will be assumed to have the meaning of §5 in §6,§7,§8,and§9. WheneverCTWILLchangesthecurrentmeaningof a variable, it outputs a record of that current meaning to an auxiliary file. For the CWEB program ham.w, this auxiliaryfileiscalledham.aux. Thefirstfewentriesof ham.auxare @$deg {ham}2 =macro@> @$argc {ham}2 \&{int}@> @$argv {ham}2 \&{char} ${*}[\,]$@> andthelastentryis @$d {ham}8 \&{register} \&{int}@> Ingeneraltheseentrieshavetheform @$ident {name}nn type@> whereident is anidentifier,name andnn are thepro- gramnameandsectionnumberwhereident isdefined, and type is a string of TEX commands to indicate its type. Inplaceof‘{name}nn’theentrymighthavethe 6 form"string"instead;thentheprogramnameandsec- tionnumberarereplacedbythestring.(Thismechanism leads, forexample,to the appearanceof<stdio.h>in HAM’s mini-index entries for printf.) Sometimes the type field says ‘\zip’. This situation doesn’t arise in HAM,nordoesitariseveryoftenin[7];butitoccurs,for example,whenapreprocessormacronamehasbeende- finedexternallyasinaMakefile,orwhenatypeisvery complicated,likeFILEin<stdio.h>. Insuchcasesthe mini-indexwillsimplysay‘FILE,<stdio.h>’,withno colonorequalssign. The user can explicitly change the current mean- ingbyspecifying@$ident {name}nn type@>anywhere in a CWEB program. This means that CTWILL’s default mechanismiseasilyoverridden. WhenCTWILLstartsprocessingaprogramfoo.w,it looksfirstforafilenamedfoo.auxthatmighthavebeen produced on a previous run. If foo.aux is present, it isreadin,andthe@$...@>commandsoffoo.auxgive current meanings to all identifiers defined in foo.w. Therefore CTWILL is able to know the meaning of an identifier before that identifier has been declared— assuming that CTWILL has been run successfully on foo.w at least once before, and assuming that the fi- naldefinitionoftheidentifieristheoneintendedatthe beginningoftheprogram. CTWILLalsolooksforanotherauxiliaryfile called foo.bux. Thisoneisnotoverwrittenoneachrun,soit canbemodifiedbytheuser. Thepurposeoffoo.buxis togiveinitialmeaningstoidentifiersthatarenotdefined infoo.aux. Forexample,ham.buxis afile containing thetwolines @i gb_graph.hux @i gb_save.hux whichtellCTWILLtoinputthefilesgb_graph.huxand gb_save.hux. The latter files contain definitions of identifiers that appear in the header files gb_graph.h andgb_save.h,whichHAMincludesin §1. Forexam- ple,oneofthelinesofgb_graph.huxis @$Vertex {GB\_GRAPH}9 =\&{struct}@> Thisline appearsalsoingb_graph.aux;itwas copied by hand, using a text editor, into gb_graph.hux, because Vertex is one of the identifiers defined in gb_graph.h. CTWILL also reads a file called system.bux, if it is present; that file contains global information that is always assumedto be in the backgroundas partofthe current environment. One of the lines in system.bux is,forexample, @$printf "<stdio.h>" \&{int} (\,)@> After system.bux, ham.aux, and ham.bux have been input, CTWILL will know initial current mean- ings of almost all identifiers that appear in HAM. The 7 onlyexceptionisk,foundin§6;itscurrentmeaningis \uninitialized,andiftheuserdoesnottakecorrec- tiveactionitsmini-indexentrywillcomeoutas k: ???,§0. Notice that d is declared in §4 of HAM and also in §8. Both of these declarations produce entries in ham.aux. SinceCTWILLreadsham.auxbeforelooking atthe sourcefile ham.w,andsince ham.auxis readse- quentially,thecurrentmeaningofdwillreferto§8atthe beginningofham.w. This causesno problem, because disneverusedinHAMexceptinthesectionswhereitis declared,henceitneverappearsinamini-index. WhenCTWILLprocesseseachsectionofaprogram, itmakesalistofallidentifiersusedinthatsection,ex- cept for reserved words. At the end of the section, it mini-outputs (that is, it outputs to the mini-index) the current meaning of each identifier on the list, unless thatcurrentmeaningreferstothecurrentsectionofthe program,orunlesstheuserintervenes. Theuserhastwowaystochangethemini-outputs, eitherbysuppressingthedefaultentriesorbyinserting replacemententries. First,theexplicitcommand @-ident@> tellsCTWILLnottoproducethestandardmini-outputfor ident inthecurrentsection. Second,theusercanspec- ifyoneormoretemporarymeaningsforanidentifier,all of which will be mini-outputat the end of the section. Temporarymeaningsdonotaffectanidentifier’scurrent meaning. Wheneveratleastonetemporarymeaningis mini-output,thecurrentmeaningwillbesuppressedjust as if the @-...@> command had been given. Tempo- rary meaningsare specified by means of the operation @%, whichtogglesastate switch affectingthe @$...@> command: At the beginning of a section, the switch is in “permanent” state, and @$...@> will change an identifier’s currentmeaning as described earlier. Each occurrence of @% changes the state from “permanent” to“temporary”orbackagain;in“temporary”statethe @$...@>commandspecifiesatemporarymeaningthat will be mini-output with no effect on the identifier’s permanent(current)meaning. Examples of these conventions will be given mo- mentarily, butfirst we should note one further interac- tionbetweenCTWILL’s@-and@$commands: IfCTWILL wouldnormallyassignanewcurrentmeaningtoident because of the semantics of C, and if the command @-ident@> has already appeared in the current sec- tion,CTWILLwillnotoverridethepresentmeaning,but CTWILLwilloutputthepresentmeaningtothe.auxfile. In particular, the user may have specified the present meaning with @$ident...@>; this allows user control overwhatgetsintothe.auxfile. 8 For example, here is a complete list of all com- mandsinsertedby the authorin orderto correctoren- hanceCTWILL’sdefaultmini-indexesforHAM: • Atthebeginningof§2, @-deg@> @$deg {ham}2 =\|u.\|I@> @%@$u {GB\_GRAPH}9 \&{util}@> tomakethedefinitionofdeg read‘u.I’insteadof just‘macro’andtomakethemini-indexrefertou asautilityfield. • Atthebeginningof§4, @-taken@> @-vert@> @$taken {ham}4 =\|v.\|I@> @%@$v {GB\_GRAPH}9 \&{util}@> @$v {ham}2 \&{register} \&{Vertex} $*$@> for similar reasons, and to suppress indexing of vert. Here the mini-index gets two “temporary” meaningsforv,oneofwhichhappenstocoincide withitspermanentmeaning. • Atthebeginningof§6, @-k@> @-t@> @-vert@> @-ark@> @$vert {ham}6 =\|w.\|V@> @$ark {ham}6 =\|x.\|A@> @%@$w {GB\_GRAPH}9 \&{util}@> @$x {GB\_GRAPH}9 \&{util}@> forsimilarreasons. That’sall. Thesecommandswerenotinsertedintotheprogramfile ham.w;theywereputintoanotherfilecalledham.chand introducedviaCWEB’s“changefile”feature[6]. Change files makeit easyto modify the effectivecontentsofa masterfilewithouttamperingwiththatfiledirectly. 4. Processing by TEX CTWILL writes a TEX file that includes mini-output at the end of each section. For example, the mini-output after§10ofHAMis \]{GB\_GRAPH}10 \\{next} \&{Arc} $*$ \[7 \\{advance} label \[6 \\{ark} =\|x.\|A \[2 \|{t} \&{register} \&{Vertex} $*$ \[4 \\{not\_taken} =macro (\,) \]{GB\_GRAPH}10 \\{tip} \&{Vertex} $*$ \[2 \|{v} \&{register} \&{Vertex} $*$ \[2 \|{a} \&{register} \&{Arc} $*$ 9 Here\[introducesaninternalreferencetoanothersec- tionofHAM;\]introducesanexternalreferencetosome other program; \\ typesets an identifier in text italics; \| typesets an identifier in math italics; \& typesets in boldface. A special debugging mode is available in which TEX will simply typeset all the mini-output at the end ofeachsection,insteadofmakingactualmini-indexes. This makes it easy for users to check that CTWILL is infactproducingtheinformationtheyreallywant. No- tice that mistakes in CTWILL’s output need not neces- sarily lead to mistakes in mini-indexes; for example, aspuriousreferencein §6to anidentifierdefinedin §5 willnotappearinamini-indexforaspreadthatincludes §5. ItisbesttomakesurethatCTWILL’soutputiscorrect beforelookingatactualmini-indexes. Thenunpleasant surpriseswon’toccurwhensectionsoftheprogramare movedfromonespreadtoanother. WhenTEXisfinallyaskedtotypesettherealmini- indexes, however, it has plenty of work to do. That’s when the fun begins. TEX’s main task, after format- ting the commentary and C code of each section, is to figure outwhetherthe currentsection fits into the cur- rentspread,and(ifitdoes)toupdatethemini-indexby mergingtogetherallentriesforthatspread. Consider, for example, what happens when TEX typesets §10 of HAM. This spread begins with §8, and TEX has already determined that §8 and §9 will fit to- gether in a single column. After typesetting the body of §10, TEX looks at the mini-index entries. If any of themreferto§8or§9,TEXwilltentativelyignorethem, because those sections are already part of the current spread. (In this case that situation doesn’t arise; but whenTEXprocessed§7,itdidsuppressentriesforvert andark,sincetheyreferredto§6.) TEXalsotentatively discardsmini-indexentries thatmatchotherentriesal- ready scheduled for the current spread. (In this case, everything is discarded except the entries for advance andark;theothers—next, t,not taken, v,anda—are duplicates of entries in the mini-output of §8 or §9.) Finally, TEX tentatively discards previously scheduled entries that refer to the current section. (In this case nothinghappens,becausenoentriesfrom§8or§9refer to§10.) Afterthiscalculation,TEXknowsthenumbernof mini-indexentriesthatwouldbeneededif§10wereto join the spread with §8 and §9. TEX divides n by the numberof columnsin the mini-index (here 2, but 3 in [3] and [7]), multiplies by the distance between mini- baselines(here9points),andaddstheresulttothetotal heightofthetypesettextforthecurrentspread(herethe heightof§8+§9+§10). Withafewminorrefinements forspacingbetweensectionsandfortheruledlinethat separatesthemini-indexfrom the restofthe text, TEX 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.