ebook img

Design and development of a dictionary translator PDF

30 Pages·1.4 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Design and development of a dictionary translator

9 Design and Development of a Dictionary Translator John V. Messina James A. St. Pierre Michael R. McCaleb U.S. DEPARTMENT OFCOMMERCE TechnologyAdministration National Institute of Standards and Technology Gaithersburg, MD 20899 QC NIST 100 U56 . HO.621 1999 NISTIR 6219 Design and Development of a Dictionary Translator John V. Messina James A. St. Pierre Michael R. McCaleb U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute ofStandards and Technology Gaithersburg, MD 20899 January 1999 U.S. DEPARTMENT OF COMMERCE William M. Daley, Secretary TECHNOLOGY ADMINISTRATION Gary R. Bachula, Acting Under Secretary for Technology NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY Raymond G. Kammer, Director U.S. DEPARTMENT OF ENERGY Washington, D.C. 20858 Design and Development of a Dictionary Translator By: John Messina. Jim St. Pierre, Mike McCaleb Abstract: In support ofthe electronic industry’s needfor quick access to technical data associated with electronic components (such as typicallyfound in hard-copy ’’databooks"), the Electronic Information Technology• Group ofthe National Institute ofStandards and Technology> (NIST) has been doing research related to the development ofon-line databooks. Onefacet ofthis research involves the development ofon-line data dictionaries that contain definitions of technical terms (e.g, rise time) that might be used (e.g., hyperlinked) within on-line databooks. Relating to on-line dictionaries, the Silicon Integration Initiative (Si2) consortium has defined an SGML (Standard Generalized Markup Language) DTD (Document Type Definition), that is referred to throughout the remainder ofthis paper as the CIDS DTD. This DTD enables the viewing ofdictionary' data, which conforms to the DTD, with an SGML browser. However, as existing dictionary’files are not conformantwith the CIDSDTD, a translator, called2CIDS, andan intermediatefile format have been developed to aid in the conversion ofdictionary data filesfrom their existingformats tofiles that comply with the CIDS DTD. This paper discusses the development ofthis translator and the intermediatefileformat. Both the translator and the intermediate file format represent elements ofthe infrastructure necessary to support electronic commerce ofcomponent information. Background 1. Relationship between the NIST ECC1 project and the Si2 EC1X project 1.1 NIST and the Silicon Integration Initiative (Si2 - formerly the CAD Framework Initiative (CFI)) consortium co-sponsored a workshop on the “Electronics Industry Components Library" in October of 1990. Since that time, both NIST and Si2 have had ongoing projects related to this topic. Today the NIST project addressing this topic is known as the Electronic Commerce ofComponent Information (ECCI) project, and the related Si2 project is known as the Electronic Component Information Exchange (ECIX) project. One ofthe early activities that came out ofSi2 (CFI at the time) is called the Pinnacles Component Information Standard (PCIS). NIST has supported the PCIS project, whose goal is to define a standard format for on-line databooks for electronic components. PCIS grew out ofan industry need for quick access to technical data (typically found in hard- copy “databooks" distributed by electronic component manufacturers). The PCIS project began by trying to define a set ofStandard Generalized Markup Language (SGML) elements (or tags) that would provide semantic information about electronic components. Once the electronics industry agrees on the set ofmarkup tags for the PCIS, as defined within a Document Type Definition (DTD), then component manufacturers will be able to create on-line databooks in an industry standard format. This on-line databook capability is a key piece ofthe infrastructure necessary to support electronic commerce for the electronics industry. Initial demonstrations ofactual component manufacturers databooks, in PCIS format, were given at the 1998 Design Automation Conference. Fig. 1. Relationship between ECCI (NIST) and ECIX (Si2) Projects. 1.2 Data Dictionaries Other key pieces ofinfrastructure necessary to support electronic commerce forthe electronics industry are on-line dictionaries to support on-line databooks. Early in the ECCI activity, industry requested NIST’s help in addressing a significant issue; the terminology used to describe electronic components. The problem is that different companies often use different terminology to describe the same concept, or in some cases they use the same terminology to described variations ofa concept. This makes the task ofcomparing components from different manufacturers much more difficult. In an effort to address this problem, NIST and Si2 have both been working with the International Electrotechnical Commission (IEC), to investigate methods for integrating the IEC dictionary with on-line databooks. In collaboration with IEC, NIST developed software to convert the IEC 61360 [1] dictionary and schema into software objects, and associated software with a Graphical User Interface by which to browse these dictionary objects (see the box labeled “NIST Browser” in Fig. 1 ). At the same time Si2 has been working with industry to define a standard dictionary format, known as the Component Information Dictionary Standard (CIDS), that is based on the IEC 61360 dictionary and schema. This standard dictionary format is specified with an SGML DTD. Consequently, data dictionaries stored in this format may be viewed with SGML browsers. It has become clearthat even ifthe electronic version ofthe IEC 61360 dictionary is available at low or zero cost (at the time ofthis writing, the cost ofaccessingthis standard on-line has not been defined), there is still a legacy ofdictionaries (within electronic component manufacturers) which must be addressed. The easiest solution would be to define a single industry consensus dictionary. However, while the industry may evolve to that in the future, today there are multiple dictionaries that must be accessible. One possible scenario is forthe IEC dictionary to be used as a base dictionary, and ifcompany specific terminology is required (to describe new or emerging technologies), they would be submitted through a national committee to the IEC for possible inclusion in the IEC dictionary. In light ofthese issues, the Si2 consortium requested NIST’s help in developing a software program that could translate existing dictionaries from eitherthe IEC or private companies into SGML documents compliant with the CIDS DTD. Such a software translator is an integral piece ofthe infrastructure necessary for electronic commerce within the electronics industry. Unfortunately there is little commercial benefits or incentives for private industry to develop such a translator. It is for this reason that developing such a translator is a very appropriate task forNIST. John Messina ofEEEL's Electricity Division has been working with Si2, IEC and others to develop such a translator, called 2CIDS. The remaining sections ofthis paper describe the design and development ofthis translator. , , 1 Input & Output Data Formats ofthe 2CIDS Translator The 2CIDS translator is designed to convert any data dictionary that is in either name-value pair or Part 21 format into a standard dictionary format as described in the Component Information Dictionary Standard (CIDS). Data dictionaries are currently stored either as a Part 21file (see clause 2.1) based on a specific EXPRESS model or as a file in some independent proprietary format. Note: EXPRESS is a data modeling language that is defined in ISO 10303-1 [2], 1 It was clear from the start that the 2CIDS translator would at the very least need to support translation from the Part 21 format. This is due to the fact that the IEC selected the Part 21 format as the official distribution format ofthe dictionary. In addition, dictionaries maintained within companies and other standards organizations must also be considered. Rather than require companies to convert their existing dictionaries into the Part 21 format, a new generic data format was developed. This new data dictionary format, called Name-Value Pair (NVP), was designed to be a generic easy-to-generate format into which any dictionary could be converted. The above requirements meant that the 2CIDS translator had to be able to handle three separate data formats: Part 21, NVP, and CIDS SGML. Before describing the 2CIDS translator in detail, a basic understanding ofthese three different data formats is desirable. Below are briefdescriptions ofthe three data formats along with a sample data set. 2.1 Part 21 File Format A Part 21 file is a text file that conforms to ISO 10303-21, “Cleartext encoding ofthe exchange structure” [3]. ISO 10303-21 is an ISO standard that describes the file format for storing data based on EXPRESS schema. There are two levels ofconformance to ISO 10303-21, syntactical and schema conformance. To be syntactically conformant, a tile must comply with ISO 10303-21. To be schema conformant, a file must be both syntactically conformant and must fulfill all the requirements and constraints specified by a corresponding EXPRESS schema. The names ofthe EXPRESS schemas, as well as the Part 21 file name and file description, are stored in the header section ofthe Part 21 file. The data section ofa Part 21 file contains instances ofthe entities defined in the corresponding EXPRESS schema. Attribute values are specified within the entity instances in the order in which they occur within the EXPRESS schema. Example: Fig. 3 is an example ofa portion ofa data section ofa Part 21 file that is based on the entity EXPRESS declarations shown in Fig. 2. Sample Express Schema Example Part 2 Data 1 ENTITY dates; #3=DATES ( ' 1998-05-21 ' , ' 1998-05-21 ' date of original definition: datetype; $) ; date of current version: date type; #552=DATES ( ' 1998-05-21 ' , ' 1998-05-21 ' date of current revision: OPTIONAL date type; $) ; ENDENTITY; #103=1TEM_NAMES (LABEL ( reference ' ENTITY itemnames; temperature')/ (), LABEL (' T_ref ') , $, preferred name: pref name type; $) ; synonymous names: SET OF syn name type; short name: short name type; #15000=ITEM_NAMES (LABEL ( 'male IEC169- languages: OPTIONAL present translations; 1(3. 1.5) (1987)'), (), icon : OPTIONAL graphics; LABEL OH'), $, $) ; END ENTITY; Fig. 2. Portion ofan EXPRESS schema. Fig. 3. Portion of Part 21 data file.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.