ebook img

Preservation with PDF/A PDF

34 Pages·2017·1.17 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Preservation with PDF/A

01000100 01010000 Preservation 01000011 with PDF/A 01000100 (2nd Edition) 01010000 Betsy A Fanning 01000011 AIIM 01000100 01010000 DPC Technology Watch Report 17-01 July 2017 01000011 01000100 01010000 01000011 01000100 Series editors on behalf of the DPC Charles Beagrie Ltd. 01010000 Principal Investigator for the Series Neil Beagrie 01000011 © Digital Preservation Coalition 2017, Betsy A Fanning 2017, and AIIM 2017, unless otherwise stated ISSN: 2048-7916 DOI: http://dx.doi.org/10.7207/twr17-01 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior permission in writing from the publisher. The moral rights of the author have been asserted. First published in Great Britain in 2008 by the Digital Preservation Coalition. Second Edition 2017. Foreword The Digital Preservation Coalition (DPC) is an advocate and catalyst for digital preservation, ensuring our members can deliver resilient long-term access to digital content and services. It is a not-for-profit membership organization whose primary objective is to raise awareness of the importance of the preservation of digital material and the attendant strategic, cultural and technological issues. It supports its members through knowledge exchange, capacity building, assurance, advocacy and partnership. The DPC’s vision is to make our digital memory accessible tomorrow. The DPC Technology Watch Reports identify, delineate, monitor and address topics that have a major bearing on ensuring our collected digital memory will be available tomorrow. They provide an advanced introduction in order to support those charged with ensuring a robust digital memory, and they are of general interest to a wide and international audience with interests in computing, information management, collections management and technology. The reports are commissioned after consultation among DPC members about shared priorities and challenges; they are commissioned from experts; and they are thoroughly scrutinized by peers before being released. The authors are asked to provide reports that are informed, current, concise and balanced; that lower the barriers to participation in digital preservation; and that are of wide utility. The reports are a distinctive and lasting contribution to the dissemination of good practice in digital preservation. This report was written by Betsy A Fanning. The report is published by the DPC in association with Charles Beagrie Ltd. Neil Beagrie, Director of Consultancy at Charles Beagrie Ltd, was commissioned to act as principal investigator for, and managing editor of, this Series in 2011. He has been further supported by an Editorial Board drawn from DPC members and peer reviewers who comment on text prior to release: William Kilbride (Chair), Janet Delve (University of Portsmouth), Marc Fresko (Inforesight), Sarah Higgins (University of Aberystwyth), Tim Keefe (Trinity College Dublin), and Dave Thompson (Wellcome Library). Acknowledgements Many subject experts and their organizations have contributed countless hours of work and time to develop the standards that this report describes. Standards work requires a unique type of person, one who is not only an expert in their field but also a person of patience, as standards development takes time. My appreciation goes to the many experts who have joined in this standards development effort. These include Stephen Levenson, who met me at an industry meeting and shared his ‘crazy’ idea, which eventually became Portable Document Format/Archive (PDF/A); many from Adobe Systems who willingly shared their knowledge and helped the committee form the requirements that make PDF/A an archival file format; and the archival community, who also shared their knowledge and helped to shape this standard. Through the development of the PDF/A standard, two project leaders, Stephen Abrams formerly of Harvard University and Leonard Rosenthol of Adobe Systems, kept the project moving at a consistent pace, taking into consideration many differing points of view to develop this standard. The Association for Information and Image Management (AIIM, http://www.aiim.org) and the Association for Suppliers of Printing, Publishing and Converting Technologies (NPES, http://www.npes.org) jointly developed this standard. It was great to see these two organizations work together sharing knowledge and expertise. I am sure there are many experts whom I should publicly acknowledge, but there is not time or space to do so adequately. I am particularly grateful to Tim Evans for his contribution to the case study, and to Sarah Higgins for the figures. I would be negligent if I did not acknowledge my employer, AIIM, for supporting this standards project. I also want to give special thanks to Neil Beagrie of Charles Beagrie Ltd, and the staff of the DPC, for their input and wise counsel in the development of this report. Betsy A Fanning July 2017 Contents 1. Introduction ................................................................................................................................................... 3 1.1. Technology Watch Report Editions ........................................................................................................... 3 1.2. Overview .................................................................................................................................................... 3 1.3. Typical Uses for PDF/A .............................................................................................................................. 4 2. History and Features of PDF and PDF/A ........................................................................................................ 5 2.1. History of PDF ............................................................................................................................................ 5 2.2. What is PDF/A? .......................................................................................................................................... 6 2.3. PDF/A-1 (ISO 19005-1:2005) ..................................................................................................................... 7 2.4. PDF/A-2 (ISO 19005-2: 2011)..................................................................................................................... 8 2.5. PDF/A-3 (ISO 19005-3: 2012)..................................................................................................................... 8 2.6. PDF/A-4 (ISO/CD 19005-4) ........................................................................................................................ 8 2.7. Why was the Standard Drafted in Multiple Parts? .................................................................................... 9 2.8. How are Engineering and Dynamic Documents Archived? ....................................................................... 9 3. Conformance and Conformance Levels ....................................................................................................... 10 3.1. Overview .................................................................................................................................................. 10 3.2. Level A Conformance ............................................................................................................................... 10 3.3. Level B Conformance ............................................................................................................................... 10 3.4. Level U Conformance .............................................................................................................................. 10 4. Metadata and Its Importance for Preservation ........................................................................................... 11 4.1. Overview .................................................................................................................................................. 11 4.2. XMP ......................................................................................................................................................... 11 5. Challenges and Lessons Learned .................................................................................................................. 12 5.1. Appropriateness for Use and Reuse: The end user perspective.............................................................. 12 5.2. Importance of Good Information and Preservation Management Practices .......................................... 12 5.3. User Creation of PDF/A ........................................................................................................................... 12 5.4. Migration to PDF/A .................................................................................................................................. 13 5.5. Assessing Quality in PDF/A Construction ................................................................................................ 13 5.6. Electronic Signatures ............................................................................................................................... 14 5.7. Preservation Implications of Embedded File Streams in PDF/A-3 ........................................................... 15 5.8. Fonts and Intellectual Property ............................................................................................................... 15 5.9. File Size and Image Compression ............................................................................................................ 15 5.10. Adoption .................................................................................................................................................. 16 6. Future Development of the PDF Standard ................................................................................................... 17 7. Archaeology Data Service Case Study .......................................................................................................... 18 8. Conclusions and Recommendations ............................................................................................................ 20 8.1. Conclusions .............................................................................................................................................. 20 8.2. Recommendations ................................................................................................................................... 20 9. Glossary........................................................................................................................................................ 22 10. Further Reading ....................................................................................................................................... 23 11. References ............................................................................................................................................... 24 12. Appendix: Standards and Technical Guides ............................................................................................ 27 Abstract This report discusses the digital document file format for long-term preservation that is known as PDF/Archive or PDF/A. As organizations have transitioned from a paper-centric to a digital document- centric way of operating, it has become necessary to develop a file format that will maintain the integrity of the information contained in the document and withstand the test of time. These documents need to be available for generations to come, due to their value for multiple purposes. As work began on the standard, it was found that this self-contained preservation file format independent of technology would provide benefits to many organizations and in many circumstances. The need for this preservation file format continues to exist, and new uses of the standard continue to be identified. This report discusses the file format and why it was developed, along with some of the issues and concerns organizations should consider when choosing to use PDF/A as one of their long-term preservation file formats. It is an updated edition of the original Technology Watch Report 08-02, Preserving the Data Explosion: Using PDF published in 2008. n o i t c u d o r t n I 1 Preservation with PDF/A (Second Edition) 1 Executive Summary The focus of this report is the PDF/Archive (PDF/A) file format and standard. PDF/A is one of several file formats promoted in digital preservation. This report assesses the claims made for PDF/A and provides guidance on how the format’s potential for digital preservation might be achieved. The report begins with the history of the Portable Document Format (PDF) to better understand how this special file format, PDF/A, came to be. It examines PDF from when it was first created through to when Adobe Systems provided the PDF specification to the International Organization for Standardization (ISO) to formalize it as an ISO Standard. PDF/A versions of PDF have been developed as a family of ISO Standards with the specific aim of addressing preservation. PDF/A is a restricted form of PDF intended to be suitable for long-term preservation by removing some features that pose preservation risks. After the initial PDF/A-1 standard was developed, three other standards – PDF/A-2, PDF/A-3, and PDF/A- 4 – were also developed to add functionality to the file format. This functionality extended the PDF/A file by enabling native files and XML to be contained in the PDF/A file while maintaining the archival nature of the file. It is relatively simple to create a PDF, or more accurately, to create a file that for all intents and purposes appears to be a PDF. The same is true for PDF/A, but for preservation purposes it is important to know how closely a file conforms to the detailed requirements defined in the standard. This report will discuss the conformance levels that were developed by the working group. It will also discuss the validation methods that were developed to ensure those conformance levels. Conformance to the standard is not a simple ‘yes/no’ binary state, in part because there are now four variants of PDF/A. One question that is often asked is: ’When should I use PDF/A, and which version should I use?’ This report attempts to answer that question and to provide some guidance about the n strengths, weaknesses, opportunities and threats associated with each. There are several conditions that o i make it beneficial to use PDF/A-3 rather than PDF/A-1, and vice versa. The report discusses these tc u conditions and reviews practical considerations to make the most effective use of the file format. d o Though important, the standards and validation methods described in this report comprise only part of a r t digital preservation strategy. The selection of a file format – even one carefully developed to support n I preservation – is not a complete digital preservation solution. The choice of file format is a component of a wider technical and organizational infrastructure which comprises a comprehensive digital preservation solution. This report provides sufficient information regarding the standard and its use to help readers use the file format better to ensure the integrity of digital information. Through a member case study, it helps readers understand the practical issues involved and lessons learned, and to determine how best to implement PDF/A in their organization. 2 Preservation with PDF/A (Second Edition) 2 1. Introduction 1.1. Technology Watch Report Editions This report is an updated edition of the original Technology Watch Report 08-02, Preserving the Data Explosion: Using PDF (Fanning, 2008). Since its original publication in 2008, when only PDF/A-1 (ISO 19005-1) was available, the International Organization for Standardization (ISO) has published two more parts of the PDF/A standard which have added features to the file format: PDF/A-2 (ISO 19005-2) and PDF/A-3 (ISO 19005-3). The development of a fourth part is underway. This new edition of the Technology Watch Report will examine all four parts of the PDF/A standard and provide guidance on the appropriate part to use. The report will also take a brief look at the future for the PDF/A standard. 1.2. Overview PDF became a ubiquitous file format for exchanging electronic copies of page-based documents because of the many benefits it confers on users. Some of its benefits include:  compatibility across all platforms;  ability to create compact and small files for easy exchange;  the ability to create PDF files from source documents;  easy-to-create PDF files;  ability to be viewed within most web browsers;  rich metadata-containing files;  ability to have other files embedded within it. However, because the PDF format is feature rich, it can cause difficulties for specific uses such as long-term n preservation. And with the advantages of the PDF file format come some risks: o i t  any file type can be embedded; c u d  the primary document can be conformant as a static document, but the embedded files may not o be static; r t n  embedded files may be infected by computer viruses; I  embedded files may have extended metadata requirements, may introduce unexpected dependencies or be subject to format obsolescence;  embedded files may complicate matters relating to information security, data protection or the management of intellectual property rights. PDF/A (A for Archive) versions of PDF have been developed as a family of ISO Standards with the specific aim of addressing preservation. PDF/A is a restricted form of PDF intended to be suitable for long-term preservation by removing features that pose preservation risks. PDF/A seeks to maximize:  device independence;  self-containment;  self-documentation. PDF/A places some restrictions to reduce preservation risks:  all fonts must be embedded and the fonts must be legally embeddable for unlimited, universal rendering;  audio and video content are forbidden;  JavaScript and executable files are prohibited;  colour spaces must be specified in a device-independent manner;  encryption is not allowed; 3 Preservation with PDF/A (Second Edition) 3  use of standards-based metadata is mandated. These restrictions make the PDF/A format a good option for long-term archiving of electronic documents, providing any restricted content is not present or is not required and can be removed. However, users need to be aware of some other preservation challenges that remain and/or are in the process of being addressed. In particular, a variety of issues have contributed to uncertainty over PDF rendering and the impact this may have on long-term preservation (British Library, 2015). The variable quality and support provided by some PDF-creating software and third-party viewers means institutions have faced challenges in converting files to PDF/A, validating the conformance of files to PDF/A, and fixing faults with the format, particularly when files have been received from a wide body of external organizations and individuals. A robust vendor-independent mechanism for assessing full compliance of PDF/A files with the standards and the conformance levels they claim in their internal metadata will go a long way to addressing this challenge. The recent development and release of the veraPDF tool, a purpose-built, open source, PDF/A file-format validator, is a major step forward (see Section 5.5). 1.3. Typical Uses for PDF/A There are many reasons why an organization might choose to use PDF/A to preserve their digital documents, including:  its standardized format for storing digital documents for long periods of time;  it allows for digitally signed documents using the very latest digital signature software;  it reliably displays special characters for mathematics and languages since all are embedded within the file; n  it displays correctly on any device as the author intended, including the reading order; o i  platform independence; tc u  provision of fully searchable documents through Optical Character Recognition. d o r t n I PDF/A can be used in many situations where we want to preserve information, such as:  scanning documents for archives;  migrating existing document files into archives;  digital mailroom processing and retention of incoming and outgoing mail;  compliance with regulations and addressing regulatory concerns, e.g. in the financial, healthcare, or pharmaceutical sectors;  eBilling and eProcurement processes where documents need to be entered into a workflow and archived;  preservation of office documents or official documents;  preservation of academic reports and publications;  open government and long-term access to information by citizens. 4 Preservation with PDF/A (Second Edition) 4 2. History and Features of PDF and PDF/A 2.1. History of PDF PDF is a file format originated by Adobe Systems in the early 1990s for the primary purpose of exchanging documents. It was intended to make digital documents essentially similar to their paper equivalents by being authentic, reliable and easy to use. The PDF Reference (http://www.adobe.com/content/dam/ Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf ) is an open specification that defines the features and functions for the PDF file format, and encompasses a number of specifications relating to different uses. Some of these have evolved over time, leading to different versions of the specification. Adobe made all the PDF Reference specifications freely available on their website (http://www.adobe.com/devnet/pdf/pdf_reference_archive.html) and allowed any software developer to use the specification in designing their own products. PDF quickly became a de facto standard. As users became more proficient with digital documents, they began to request functionality that added new features to successive versions of the PDF specification, as shown in Table 1. Table 1 – Features introduced in PDF PDF PDF PDF PDF PDF PDF PDF A 1.1 1.2 1.3 1.4 1.5 1.6 1.7 / External links ✓ ✓ ✓ ✓ ✓ ✓ ✓ F D Article threads ✓ ✓ ✓ ✓ ✓ ✓ ✓ P Security features ✓ ✓ ✓ ✓ ✓ ✓ ✓ d Device-independent colour ✓ ✓ ✓ ✓ ✓ ✓ ✓ n a Notes ✓ ✓ ✓ ✓ ✓ ✓ ✓ F Support for OPI (Open Process Interface) 1.3 ✓ ✓ ✓ ✓ ✓ ✓ D Support for CMYK (colour model for cyan, magenta, yellow, and ✓ ✓ ✓ ✓ ✓ ✓ P f key black) o Maintenance of spot colours in PDF ✓ ✓ ✓ ✓ ✓ ✓ s e Halftone functions could be included as well as overprint ✓ ✓ ✓ ✓ ✓ ✓ r u instructions t 2-byte CID fonts ✓ ✓ ✓ ✓ ✓ a e OPI 2.0 specifications ✓ ✓ ✓ ✓ ✓ F DeviceN, a new colour space to improve support for spot colours ✓ ✓ ✓ ✓ ✓ d n Smooth shading, a technology that allows for efficient and very ✓ ✓ ✓ ✓ ✓ a smooth blends (transitions from one colour or tint to another) y r Annotations ✓ ✓ ✓ ✓ ✓ o t Transparency support that allows text or images to be seen ✓ ✓ ✓ ✓ s i through H Improved security ✓ ✓ ✓ ✓ Improved support for JavaScript ✓ ✓ ✓ ✓ Improved compression techniques including object streams and ✓ ✓ ✓ JPEG2000 compression Support for layers ✓ ✓ ✓ Improved support for tagged PDF ✓ ✓ ✓ Improved encryption algorithms ✓ ✓ OpenType fonts embedded ✓ ✓ Ability to embed files to be a container file format ✓ ✓ Ability to embed 3D data ✓ ✓ Improved support for commenting and security ✓ 3D support improvements ✓ In 2000, Adobe Systems initiated the first of what would become several efforts to standardize subsets of the PDF Reference for specific purposes. The first subset to be introduced was for document exchange and became known as PDF/X. After this came numerous other ISO PDF standards (see Figure 1). 5 Preservation with PDF/A (Second Edition) 5

Description:
Preservation with PDF/A. (2nd Edition). DPC Technology Watch Report 17-01 July 2017. Betsy A Fanning. AIIM. Series editors on behalf of the DPC became Portable Document Format/Archive (PDF/A); many from Adobe Systems who willingly shared their .. Challenges and Lessons Learned .
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.