Analysis of the Principles and Characteristics of PDF Information Expression

We are a big printing company in Shenzhen China . We offer all book publications, hardcover book printing, papercover book printing, hardcover notebook, sprial book printing, saddle stiching book printing, booklet printing,packaging box, calendars, all kinds of PVC, product brochures, notes, Children's book, stickers, all kinds of special paper color printing products, game cardand so on.

For more information please visit

http://www.joyful-printing.com. ENG only

http://www.joyful-printing.net

http://www.joyful-printing.org

email: info@joyful-printing.net

First, PDF overview

PDF (Portable Document Format) is a structured document format. It was first released in 1993 by the famous American typesetting and image processing software Adobe (version 1.0), and in the same year launched its corresponding support software product series Adobe Acrobat version 1.0; then Adobe revised and upgraded it. In 1994, version 1.1 was released, and the support software product series Adobe Acrobat 2.0 and 2.1 was released. The subsequent PDF version 1.2 was released on November 27, 1996, and the corresponding support software product series, Adobe Acrobat, was also upgraded to version 3.0. By the end of 1997, the International Organization for Standardization had begun to adopt PDF as an international standard.

1.Comparison of PDF and PS

The PS language (PostScript language, page description language) is also a de facto printing industry standard owned by Adobe. It can describe beautiful layouts and dominate the current printing field. PDFs evolved from PS, and they have almost the same capabilities and similar descriptions in terms of page descriptions. PDF uses the same Imaging Model as PS to represent text and graphics. Like the PS language, PDF page description commands also draw pages by coloring selected regions. The colored areas can be outlines of letters, etc., areas defined by lines and curves, and bitmaps. The colored colors can be arbitrary, and any graphic on the page can be cropped into other shapes. The page is completely empty at the beginning, various instructions draw different graphics onto the page, and the new graphics are opaque, which can overwrite the old graphics.

Even so, PDF is quite different from PS. This is mainly manifested in the following aspects: 1 PDF files can contain interactive objects, such as hyperlinks, interactive forms, etc., while PS does not. 2PDF is a file structure, and PS is a programming language, so PDF has higher processing efficiency than PS. 3 PDF's strict structural definition allows the degree of application to randomly access an object, while the PS can only access the whole sequence. For example, to access page 100 of a PS file, you must first explain the first 99 pages in order before you can find page 100, and access to each page in PDF is just as fast. 4PDF also contains font description information such as the size of the font, so that when the font does not exist, font simulation (not a simple font substitution) can be performed to ensure the consistency of the document display.

2.Characteristics of PDF

The characteristics of PDF can be summarized as follows: 1 Transitivity. The PDF file supports two encoding methods of 7-bit ASCII code and binary code, which can be correctly transmitted in various network environments. 2 support interactive operations. PDFs contain interactive objects such as interactive forms and hyperlinks. 3 support sound, animation. 4 supports random access to page content, which improves the speed of various operations of the page. 5 Support for additional modifications to facilitate minor modifications and improvements. 6 supports a variety of compression coding methods, the file structure is more compact. 7 font independence. The PDF file can have its own font description information, so that the correct display of the document can still be guaranteed if the user system lacks the required font. 8 platform independence. PDF files have platform independence of software and hardware. This feature is very suitable for the exchange of information in the network transmission, in order to avoid the garbled troubles. 9 security control. PDF files support a variety of security controls at different levels. This security control is very important to protect the copyright of electronic publications. We can set different levels of security according to the security requirements of different electronic publications.

Second, PDF principle structure

1.PDF file structure

The file structure of a PDF (that is, the physical structure) consists of four parts: a file header, a file body, a cross-reference table, and a file tail. See Figure 1.

The file header indicates the version number of the PDF specification that the file complies with, which appears in the first line of the PDF file.

The file body consists of a series of PDF indirect objects (IndirectObject).

The cross-reference list is an address index table of an indirect object established for random access to indirect objects.

The end of the file declares the address of the cross-reference table, which indicates the root object of the file body (Catalog), and also stores security information such as encryption.

2.PDF document structure

The document structure of the PDF is the logical organization structure of the PDF file content, which reflects the hierarchical relationship between the indirect objects in the file body. The PDF document structure is a tree structure, as shown in Figure 2. The root node of the tree is also the root object of the PDF file. There are four subtrees under the root node: the Pages Tree, the Outline Tree, the ArticleThreads, and the NamedDestination.

Among them, in the page tree, all page objects are leaf nodes of the tree, and they will inherit the attribute values of the parent node as the default values of their corresponding attributes. The bookmark tree is to bookmark according to the hierarchical relationship of the tree hierarchy.

Mark) Organized, the bookmark establishes a book signature associated with the location of a specific page, which allows the user to access the content of the document in accordance with the signature of the book. The clue tree organizes the article clues and the Article Beads under the clues according to the structure of the tree. As for the name tree, it establishes a correspondence between a string (name) and a page area. Each leaf node in the tree holds the string and its corresponding page area, while the non-leaf node is just an index. To allow applications to quickly access leaf nodes. The role of the name tree is to allow other objects in the PDF file to represent a certain page area with a string name.

3. Resources in PDF

The page content (such as text, graphics, images, etc.) in the PDF is stored in the stream object (hereinafter referred to as the content stream) corresponding to the Contents keyword of the page object. A lot of basic objects (such as numbers, strings, etc.) are used in the content stream, which are represented by direct objects. But there are other objects (such as fonts) that are represented by a dictionary object or a stream object. They cannot be represented by direct objects, and no indirect objects can appear in the content stream. (Otherwise, it cannot be distinguished from the data of the content itself), so these objects are named separately and represented by the corresponding names in the content stream. These objects represented by names are called Named Resources.

There is a resource item (Resources Key) in the page object, which lists all the resources used in the content stream, and establishes a mapping table between the resource name and the resource object. Named resources in PDF are: Proc Set, Font, Color Space, External Objects (X Object (including Image, Form, and PS Segment), etc.), Extended Graphics State (Extended Graphics) State), Pattern, and User Extension List.

Non-named resources are: Enc oding, Font De s c-riptor, Halftone, Function, and C Map. Since non-named resources are implicitly used, there is no need for naming.

4.PDF page description instructions

There are 60 page description instructions in the PDF. These 60 page description instructions describe a series of graphic objects on the page. These graphic objects can be roughly divided into four categories, namely, a Path Object, a Text Object, an Image Object, and an External Object. They are the basic elements that make up all the pages.

Third, PDF file generation

There are currently two ways to generate PDF files:

1. Generate a PDF by printing. That is to say, through a virtual PDF printer, the application's text and graphics commands (such as GDI commands under Windows, Quick-Draw commands under MAC, etc.) are converted into PDF commands and saved in the corresponding PDF files. As shown in Figure 3. Adobe Acrobat installed After PDFWriter, in theory, all applications that have printing capabilities should be able to store the content to be printed into a PDF file. However, there are still many problems with the current generation of PDF files in Chinese.

2. Convert from PS to PDF. This is another way to generate a PDF. The application first sends the content to be printed to the PS file, and then the Adobe AcrobatDistiller converts the PS file into a PDF file. See Figure 4.

Both methods of generating PDFs have their pros and cons. The advantage of generating a PDF by printing is that it can be tightly integrated with the application, and it appears to the user that it generates PDF directly from the application, but the disadvantage is due to the limitations of the GDI instruction set and the Quick-Draw instruction set itself. It is difficult to generate high-precision PDF. However, although there is one more process for converting from PS to PDF, the generated PDF can achieve the quality and precision of the printing level because of the high-precision description capability of the PS itself. After the PDF file is generated, the user can use AcrobatReader to read and print, and can also use AcrobatExchange to add a series of interactions such as page thumbnails, hyperlinks, bookmarks (or directories), comments, etc. to the PDF file. Attributes. When using Adobe to provide tools to generate PDFs, there are currently problems with Chinese support, such as not supporting the download of Chinese fonts, Chinese display dependent operating systems, and so on.

Fourth, the application of PDF in digital process and its prospects

Because PDF has many features suitable for electronic publishing, it is now increasingly used in modern digital workflows. Among them, the specific application can be divided into three cases: making CD-ROM electronic publications, mixing with HTML to publish information, independently using PDF to create homepages and publishing information.

The use of PDF to produce CD-ROM electronic publications is currently the most widely used situation, such as the widely circulated "Golden Book House" CD, and the "Encyclopedia of China CD-ROM" published by China Encyclopedia Publishing House, etc. PDFs are produced for CD publications, and they are all examples of PDFs successfully applied to digital processes.

Since only a small number of WWW servers support PDF, it is not realistic to use PDF alone to create a home page and post information for a while. However, a large number of WWW sites have begun to use a mixture of HTML and PDF to publish information. For example, if you embed PDF in the HTML framework, the two can be seamlessly combined. For WWW sites that support PDF, users can read PDF and HTML from the above, and can also read and read. When reading PDFs from WWW sites that do not support PDF, users can only read PDF files after they have been completely downloaded to the local area. At present, a large number of electronic magazines have begun to use PDF to be distributed on the Internet.

Now, Agfa has introduced a third-generation ApogeePDF workflow solution that is compatible with JDF, further expanding and simplifying the entire workflow. It has a higher degree of automation, controllability, openness and scalability, and is also easier to use. The ApogeePDF solution supports page-based workflows and workflows based on the entire large version, making work more flexible, meeting different work styles and production needs, and really bringing automation to another level. Through the cooperation of PDF and JDF, users can begin to combine business and production processes to truly integrate workflow and terminal to terminal automation. JDF is an open, extensible, XML-based work definition format that provides a flexible and comprehensive solution for customers from the entire order to the final delivery, which is more than the previous one. Any form of work must be more complete and effective.

The emergence of PDF has had a huge impact on the traditional digital printing process. The traditional PS-centric printing will face the challenge of PDF. The future PDF RIP (RasterImage Processor) will gradually replace PS RIP, thus truly Realize the idea of "one production, multiple use".