Portable Document Format

Template: Infobox file format / Maintenance / type missing template: Infobox file format / Maintenance / missing site

Template: Infobox file format / Maintenance / MagischeZahlHex missing % PDF

The Portable Document Format ( PDF; German: ( trans) portable document format ) is a platform-independent file format for documents, which was developed by company Adobe Systems, and published in 1993.

The aim was to create a file format for electronic document that can pass these independent of the original application program, the operating system or hardware platform faithfully. A reader of a PDF file to the document can view and print in the form of getting that set by the author. The typical conversion problems ( such as altered pagination or incorrect fonts ) when replacing a document between different programs thus eliminating the.

In addition to text, images and graphics can be a PDF file included also aids that facilitate navigation within the document. These include clickable table of contents and page previews miniaturized.

PDF is now widely used and is considered by many electronic journals (e -journals) used. There are numerous software products on the market that can create PDF files.

  • 2.1 Scope of documents
  • 2.2 Storage of information in the document
  • 2.3 Security of documents
  • 2.4 vulnerabilities caused by human error
  • 2.5 Editing Documents
  • 2.6 Further properties
  • 6.1 MediaBox ( Media Framework )
  • 6.2 CropBox ( crop rectangle )
  • 6.3 BleedBox ( BleedBox )
  • 6.4 TrimBox ( trim box )
  • 6.5 ArtBox (Object context )
  • 8.1 XML
  • HTML 8.2
  • 8.4 DVI
  • 8.5 3D CAD data
  • 8.6 Video Formats
  • 8.7 files from office applications

Survey

Fundamentals and Software

For text, images and graphics - mixed or individually - can be generated with appropriate programs PDF documents and files (eg with free programs such as the PDFCreator and office packages LibreOffice / OpenOffice.org, paid as Adobe Acrobat or just about the print dialog ) and with appropriate reading programs represent (eg Evince, Ghostscript, eyepiece, Adobe Reader, Foxit, Preview ). The creator of a PDF file can protect them in many ways from unwanted use by activating the security mechanisms of PDF. By encrypting the access of unauthorized persons shall be prevented. Depending on requirements, already opening the file require a password, or copying content from the file or printing is not allowed. However, it is implemented in PDF protection mechanisms are not reliable; particularly simple forms of encryption are easy to overcome.

In the initial phase of the Adobe Reader was charged. Only the free distribution of the software enabled the dissemination in today's extent. PDF has long been a commercial (proprietary ), but laid open file format that is documented in the PDF Reference Manual from Adobe. In early 2007 it introduced in the standardization process of the ISO and with the release on July 1, 2008 PDF is in version 1.7 as ISO 32000-1:2008 Open Standard Adobe.

Certain methods for handling PDF have been standardized by the ISO to facilitate the exchange of data in prepress ( PDF / X) ahead of time and for long-term archiving of PDF files (as in ISO 19005-1:2005 PDF/A-1 ).

Production and conversion

PDF is a vector-based page description language that allows free scalability of the representation. PDF files describe the layout produced by the creation of an independent program in the printer and preferences shape largely original. This is one of the main differences between PDF and advanced description and markup languages ​​such as SGML or HTML, when it comes to the demand for unconditional loyalty layout.

To a representation on output devices with small display area - such as PDAs or mobile phones - to optimize, can (similar to HTML tags ) are stored, which properly break the page content in a PDF Awards - then inevitably with restriction of layout fidelity - allow. Such awards also enable a reading program that visually impaired users read the document and make it easier to convert the content into other formats.

A commonly used program for generating PDF files is Adobe Acrobat Distiller, which creates the PostScript files PDFs. Acrobat Distiller is available as a desktop product for Windows and Mac OS. Server versions as well as the free Adobe Reader also exist for other platforms. Using free software Wine Acrobat Distiller also runs under Linux. General Office and DTP programs of other manufacturers offer a direct PDF export and are available on a variety of platforms. Furthermore, it is possible using pdftex to create a PDF file directly from LaTeX. Today, with numerous tools and programming libraries can be created in many different ways with different specializations PDF files, the creation of PDF files can be viewed almost on any platform.

Adobe grants developers partially the right to develop their own applications for the generation and processing of PDF documents, but reserves the copyright to the specifications. PDF page description language that can be viewed as an evolution of the graphics model of PostScript, which is also disclosed. End of February 2007, Adobe has announced to introduce the PDF specification version 1.7 in the ISO standardization and work for it together with the American industry association AIIM, the holding the secretariat of the ISO committee TC 171. Adobe threatened Microsoft with integration of open standards PDF with antitrust lawsuits.

Use and characteristics

A PDF file can be documents of an original program, including all colors, raster and vector graphics, play very precisely. The principle also applies to fonts.

Volume of documents

PDF documents can have a circumference of one to several hundred thousand pages. The page size is not limited by the format itself. In Adobe Acrobat, however, there is the implementation -specific limitations (up to version 3 to 45 × 45 inches [ about 1.14 m], up to version 6 to 200 × 200 inches [ 5.08 m ] and since version 7 on the 75000 times, which are 15,000,000 15,000,000 × inches [ 381 km ] ).

Storage of information in the document

In PDF files, all information is stored as numbered objects. Objects such as font information, character widths, used character encodings (Mac / PC ... ), page description, parameters for the decoder, Crop boxes, individual bookmarks, color definitions, page sequences, bitmaps, forms, jump labels and anything else stored in PDF files can be. A hundred -page PDF file can easily contain 10,000 objects.

PDF is based on the same graph model as the page description language PostScript, but allowed against this some additional features - in particular interactive elements such as bookmarks, comments, form fields and their programming with JavaScript are possible. Also the graphics model for the page content was compared to PostScript features like transparency or optional switchable or content ( in the Acrobat user interface referred to as tiers ) or extends the support of ICC profiles and OpenType fonts.

Fonts ( with the exception of pixel fonts) and vector graphics can be enlarged without loss of quality. Large network diagrams and data models can be lossless stored on a PDF page under those conditions.

For PDF documents to text parts, tables, and graphics (including excerpts thereof) slightly further processing in other applications by copying and pasting the respective elements, if the creator of the document has approved it. Text can not only for further processing in other applications, but also for browsing or for use with other output media such as screen readers are extracted. By text search in the single document or the full-text search within a PDF document collection is detailed contents can very easily find. This works even if the text graphically distorted, is shown approximately in a circle or curve shape.

Security of documents

A special feature of PDF is the optional document protection with 40 - or 128- bit encryption. By assigning a user password, it is thus possible to make only accessible to a limited number of persons to the document. Furthermore, the author can specifically define the rights award of the document with a separate owner password. This can prevent that users modify the document, print or copy partial contents. Even without knowledge of the owner password these rights restrictions can, however, using various tools especially easy to remove, if indeed an owner password is used to open the document but no password required (that means that no user password is set). The problems associated with the use of encryption restrictions can be carried save screenshots as bitmaps and subsequent optical text recognition easily circumvented.

By appropriate tools also can assign rights that make it possible to provide, or save the form entries PDF documents with notes, comments, and file attachments. Originally, these features could only be used with Adobe Acrobat, since version 7 but it is also with the free Adobe Reader possible to add notes and comments and to fill in interactive form fields, provided that the relevant document was provided by the author with the necessary permissions.

Meanwhile, there are also DRM - protected PDF files. This can be read, among other things with Adobe Digital Editions.

Vulnerabilities caused by human error

PDF files can sometimes unintentionally contain sensitive information, which is not directly visible, but can be found by text search. On one hand, information may be obscured by objects or located outside the page display area, on the other hand, a PDF containing metadata that is displayed only when calling the appropriate dialogues and occasionally remain unnoticed. In particular, the " blackening " of text passages, it is not sufficient to cover the relevant text passage, but they must be completely removed from the PDF. It is not always easy to avoid unwanted information in a PDF, especially not if you do not exclusively own the PDF ( and the document from which it was generated) created and edited. It is important that the blackening of text passages, a tool is used, which completely removed the content in question. Equally important is to check the metadata (found in Acrobat via File / Properties ). Acrobat Professional 8 here offers extensive support, in particular through a special function to remove hidden information.

Case Studies:

  • A file that dealt with the death of Italian agent Nicola Calipari, the journalist Giuliana Sgrena liberated from the Iraqi hostage in March 2005. U.S. soldiers shot him there shortly after this action. The published report was censored. But the editorial revisions were found in the published file by copying the text and save to a new file.
  • The White House in Washington released George W. Bush's speech on 'Plan for Victory in Iraq. " The file information laid bare the ghost writer, namely Peter Feaver, professor of political science from Duke University in North Carolina, the National Security Council advises since June 2005.
  • After the attack on the convoy of Lebanese politician Rafik Hariri found themselves in a publicly published PDF of the UN, the previously deleted references to names of Syrian officials who were suspected to be responsible for the assassination (see Mehlis report ).
  • 2007 came as part of the Formula 1 espionage confidential information about the car of the Scuderia Ferrari to the public after its central technical values ​​were provided in a PDF document that served as evidence, only with a black bar, but the text continues was available and extractable.

Editing Documents

PDF was designed as an interchange format for completed documents. Meanwhile, there are a number of programs and extensions for Adobe Acrobat, which can be edit PDF files. However, the format is not comparable with file formats of word processing and graphics software and is, apart from the comment and note function, only limited to the processing of documents. However, it is possible, within certain limits, for example to remove typos. Advantages in desktop publishing, the integration of all elements for the creation of printing for graphic artists and designers.

PDF documents can be both larger and smaller than the files the original application depends on the individual case. The size of a document depends on the type of data contained, the efficiency of the creation of the program and whether fonts have been embedded. Fonts can either completely, but are not embedded as a subset of the actual characters used in the document or. If a document is to be reliably represented, regardless of whether the fonts used are installed on the target platform, at least the characters actually used must be embedded.

Other properties

PDF has been adapted several times in the course of its development to specific requirements for the use of the Internet. So had to be a document originally fully available to be presented to. Meanwhile, it is possible to linearize PDF documents, so that parts of it can be displayed during charging already. Since version 1.5 of the PDF specification several objects can be combined into a PDF and then compressed, which in particular the measures necessary for the document structure numerous small objects to a significantly better compression results (image data or the actual description of one side were always compressed).

When archiving paper documents as PDF files, a mixed approach is preferred to obtain both the original document as far as possible and to ensure searchability. By cleverly combining the principle supported by PDF image compression method a very strong compression is achieved (typically 1:200 ) by background (typically flat structures, and gradients) compressed and text (sharp edges, but only a few colors) with different, specially suitable method and subsequently superimposed. The actual text is extracted using OCR methods and embedded invisibly.

The properties of PDF files

One of the strengths of the PDF is that it is available for all common platforms viewers ( Viewer and Reader ), so that a platform-independent representation of the contents is possible. This means that is displayed on any hardware and software platform with an appropriate viewer program the contents of a PDF file without any graphical difference.

Subsequent editing of PDF files is difficult. Nevertheless, there are several programs that help to cut out individual pages or even to change content.

Standard fonts

14 fonts ( standard 14 fonts ) are available by default in PDF readers and need (except for PDF / A documents ) not therefore be separately embedded in the PDF document:

  • Courier ( in the variants standard, bold, italic, bold-italic )
  • Helvetica ( in the variants standard, bold, italic, bold-italic )
  • Icon
  • Times New Roman ( in the variants standard, bold, italic, bold-italic )
  • Zapf Dingbats

PDF in operating systems

With the operating system Mac OS X from Apple for the first time PDF was used as a standard format for the display output as well as the print edition. The PDF creation can take place out that has a print dialog from any program. Because PDF is also used to generate the print data, it is possible to print PostScript on Non - PostScript printers. It is possible, almost all types of documents that can be printed, convert to PDF. This option is also under GNU / Linux, there is for example in the print dialog of the Gnome desktop environment, natively offered the option of PDF generation.

Under Windows, Mac OS Classic and Mac OS X, GNU / Linux and Unix operating systems, Solaris, HP / UX and AIX there is the free Adobe Reader / Acrobat Reader. Under eComStation there is the integrated viewer Lucide. Since Windows 8 Windows now also owns an integrated PDF reader.

For unix based systems, there is also the program Xpdf, which is reduced to the most basic functions (display on screen, search the document, printing), as well as some other programs ( eyepiece and Evince ) that are adapted to their desktop environment. These open source programs are also suitable to circumvent supposed "security features " of PDF documents - so it is partly possible to print documents, even though the author wanted the viewer of the document actually deny this possibility.

To the page geometry

Information on the geometry page in a PDF document are very important especially in the printing industry. They describe the area in which one side is the one content corresponding to the trimmed end format and where one processing program (such as imposition, ie the assembly of pages on a single sheet ) can expect a bleed. Modern DTP programs store this information for direct export to PDF in the PDF from. If PDFs output via PostScript, the relevant details are usually not included. Adobe Acrobat Distiller is for the PostScript output from certain programs to be able to derive the net site area of the crop marks, unless they are issued with.

MediaBox ( Media Framework )

It defines the size of the output medium of the PDF document. The document is not yet circumcised, and usually contains the set in the PDF generator PostScript page size. The MediaBox must always be the greatest of all the boxes, since they must include all other boxes with, and it is the only box that must be contained in a PDF always.

CropBox ( crop rectangle )

The CropBox ( sometimes referred to as a mask frame) describes the area of a PDF page to be printed to the screen or the printer. Defaults are the values ​​of the MediaBox.

BleedBox ( BleedBox )

A BleedBox contains information about the gating framework that defines the final size of the plus provided Beschnitts ( Bleed ). In the printing industry, a bleed is usually 3-5 mm per side needed. An application example are images ( ie directly on the page) are in the gate and dropping images that are cut in the margin. For an A4 page, to be delivered to a commercial printer, so so 216 mm × results at a 3 mm bleed for BleedBox a width of 210 mm 6 mm and a length of 297 mm 6 mm, 303 mm. Defaults are the values ​​of the CropBox.

TrimBox ( trim box )

The TrimBox is the final format of the document without trimming.

ArtBox (Object context )

The ArtBox (also called bounding box ) describes the sidecut, to be used when placing the PDF page in another program; comparable to the size specified for importing an EPS file.

Form processes

PDF documents can contain interactive form items to form processes in addition to text and graphic elements. Complete forms can thus combine into a single PDF document; the data collected in the document can be sent back via different routes to the editor of the form.

  • Print and fill: The completed form can be sent by hand in conventional ways by mail or fax.
  • Complete and Print: The electronically completed document can be printed and shipped in conventional ways.
  • Send completing and via an HTTP connection, the electronically completed form can be sent electronically via the web browser or from Adobe Acrobat out.
  • Fill out and send the form electronically filled out by e -mail.

Convert from other formats to PDF

XML

PDF documents can be created from XML data in two steps:

  • Transformation with an XSLT stylesheet in the XSL -FO Format
  • A PDF - processor (for example, Apache FOP, Altsoft Xml2PDF ) generated from the finished PDF document.

An easy- to-understand example is the transformation / formatting of invitation.xml in PDF (the example also shows the transformation into XHTML and WordML ).

HTML

From an HTML file, PDF documents can be created using the HTMLDOC program. This program has both a console interface and a graphical user interface. This makes it both on a client for direct operation by the user and the server use, eg for on- the-fly generation of PDF documents suitable.

A Perl module HTML :: HTMLDoc facilitates Perl developers to interface to the command line.

The wkhtmltopdf program allows the generation of multiple PDF documents at a time, possibly including the table of contents. By using the Webkit Browser Bundle are very high-quality documents, eg for documentation of complete sites generated. wkhtmltopdf is a console program and can optionally be operated via the wkhtmltopdf Perl module.

The TeX typesetting program was developed by the Hàn Thế Thành pdfTeX the ability to create PDF directly from TeX sources.

DVI

From DVI files, PDF files can be created with the help of the driver dvipdfmx.

3D CAD data

3D PDF documents are PDF documents that contain either a U3D surface model or a BREP / surface model PRC.

The free Adobe Reader can since version 8.1 show both formats, animate, cut or measure. Also in Acrobat X or Adobe Reader X and also in future releases 3D PDF documents can be opened directly and operate. Since Acrobat 7, there are 3D PDF documents. With Acrobat 3D V7 and V8 Acrobat 3D or Acrobat 9 Pro Extended can be 3D data from over 40 3D CAD formats, Dassault CATIA, PTC PRO E, Siemens PLM NX, Siemens Solid Edge, Dassault SolidWorks, Autodesk Inventor / DWG / convert DXF/DWF/3DS, STEP, IGES, Parasolid, OBJ, VRML and IFC in 3D PDF files.

2010 Adobe has the development, distribution and support of 3D PDF Converter technology completely handed over to Tech Soft 3D and PROSTEP. The 3D conversion lens to be further developed by Tech Soft 3D. From Acrobat X, the conversion is therefore available as a paid 3D PDF Converter plugin the company Tetra 4D. The company Tetra 4D has licensed the 3D conversion lens licenses for Acrobat X as a plug- Tech Soft 3D. The 3D PDF Server License Adobe has awarded to PROSTEP.

Meanwhile, there are from some CAD programs such as Allplan a direct 3D PDF export. Also, some editing programs on the market that can prepare and animate the 3D PDF models. Thus, in Acrobat 9 Pro Extended is included or the 3D PDF Converter plugin from Tetra 4D 3D Reviewer. However, other tools such as Deep Exploration of SAP, Pages3D of QuadriSpace and 3DVIA Composer from Dassault Systèmes can prepare 3D CAD models, animate and save as a 3D PDF. In addition to the 3D data visualization as well as other relevant information can be inserted into a CAD model into a PDF. The data of the final model, for example, for the production of a sheet-metal part are shown as 3D visualization, and the necessary processing is as CAD - neutral format.

Video Formats

From Acrobat 9 diverse in Shockwave Flash ( SWF) converted video formats in Adobe PDF format can be embedded. This video, animations, and applications can be used on many platforms. Due to the Adobe Flash support in Adobe Reader Version 9 no additional media player is necessary for playback. Since the last patch of Adobe Acrobat and Adobe Reader 9 Adobe has changed the default behavior to 3D PDF, including the Flash Player from Adobe Acrobat 9 and Adobe Reader 9 has been removed - Flash content access the Flash Player in the operating system. Adobe Acrobat X and Adobe Reader X are not affected because there done almost no attacks. 3D data, SWF functions and SWF video formats can be combined; so it is possible to assign sub-areas of the model with SWF videos and features.

Files from office applications

Many current software packages such as Microsoft Office, LibreOffice, OpenOffice.org or SoftMaker Office offer a PDF export.

Versions of the PDF format

Norms and Standards

In various committees of the ISO 1997 standards on the basis of PDF will be developed and adopted since. Here are defined based on certain PDF versions, minimum requirements and restrictions. Adobe Systems has transferred in this context, the relevant committees of the ISO the right to indefinitely provide the necessary specifications for each download.

These standards correspond in a rough approximation as a basis different versions of PDF:

21333
de