What is PDF/A
PDF/A is a file format according to the ISO standard for the long-term archiving of electronic documents. It is a 'subset of PDF' that excludes PDF features which are not suited to long-term archiving.
There are different levels of PDF/A:
PDF/A-1b - Level B compliance in Part 1
PDF/A-1b has the objective of ensuring reliable reproduction of the visual appearance of the document.
PDF/A-1a - Level A compliance in Part 1
PDF/A-1a includes all the requirements of PDF/A-1b and additionally requires that document structure be included (also known as being “tagged”/“Tagged PDF”), with the objective of ensuring that document content can be searched and repurposed. PDF/A-1a also requires Unicode character maps.
PDF/A-2 is based on ISO 32000-1
A-2 a new standard
PDF 1.7 and is defined by ISO 19005-2:2011, published on June 20, 2011 under the formal name Document management – Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2).
The standard was published in October 2012 and differs form PDF/A-2 in a way that it allows to embed all kinds of file formats. For example: XML, Office formats, raw binary data, etc
Important: the long-term compatibility will only be guaranteed for the PDF-part of the collection. If an organization will embed other file formats, then there are reasons/benefits to have access to the other file formats and accepting the risk that they are not usable in 100 years.
PDF/A Minimum Requirements
Conditions for PDF/A compliancy:
Audio and video content are forbidden.
All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering. This also applies to the so-called PostScript standard fonts such as Times or Helvetica.
Colorspaces specified in a device-independent manner.
Encryption is forbidden.
Use of standards-based metadata is mandated.
External content references are forbidden.
LZW and JPEG2000 image compressions are forbidden in PDF/A-1,
but JPEG 2000 compression is allowed in PDF/A-2.
Transparent objects and layers (Optional Content Groups) are forbidden in PDF/A-1, but they are supported in PDF/A-2.
Provisions for digital signatures in accordance with the PAdES (PDF Advanced Electronic Signatures) standard are supported in PDF/A-2.
Embedded files are forbidden in PDF/A-1, but PDF/A-2 offers the possibility to embed PDF/A files, allowing archiving of sets of documents in a single file.
PDF/A Support in ABBYY Technology Products
PDF/A Export (PDF/A-1b & PDF/A-1a) is available in the following ABBYY technology products
FineReader Engine - OCR & Document Conversion
- FineReader Engine 12 Windows & Linux & Mac
- FineReader Engine 11 Windows & Linux & Mac
- FineReader Engine 10 Windows
- FineReader Engine 9.0 Windows
- FineReader Engine 9.0 Linux
- FineReader Engine 8.1 Windows
FlexiCapture Engine - Separation, Classification & Data Capture
- FlexiCapture SDK
- FlexiCapture Engine 12
- FlexiCapture Engine 11
- FlexiCapture Engine 10
- FlexiCapture Engine 9.0
- FlexiCapture Engine 8.0
FineReader Server - Solution for server-based processing and document capture
- FineReader Server 4.0
- Recognition Server 3.0
- Recognition Server 2.0
FlexiCapture - Solutions for Data Capture
- FlexiCapture 12 - all versions
- FlexiCapture 11 - all versions
- FlexiCapture 10 - all versions
- FlexiCapture 9.0 - all versions
- FlexiCapture 8 Professional
In addition to the common PDF and PDF/A-1 formats, FineReader Engine 11 now experts to PDF/a-2. The new options of the ISO standard format are:
Support of JPEG2000 compression to generate smaller files
A-2a – tagged & unicode PDF/A-2
A-2u – not-tagged PDF/A-2 with an ability to extract text in Unicode.
PDF/A-2 enables creation of smaller PDF files using JPEG2000 compression. For long-term archiving, this can help reduce used storage space and enable faster access when working on low bandwidth networks.
The general technical changes of PDF/A-2 are:
based on based PDF 1.7 (ISO 32000-1)
highly efficient JPEG2000 compression allowed
support for transparency effects and layers
embedding of OpenType fonts
provisions for digital signatures in accordance with the
PAdES (PDF Advanced Electronic Signatures) standard.
possibility to embed PDF/A files in PDF/A-2,
allowing archiving of sets of documents as individual documents in a single file.
PDF/A-3 is an extension of the A-2 standard which allows inclusion of PDF/A files or files in a variety of other binary formats such as XML or Office formats. Long-term archiving and readability of the PDF/A part is still guaranteed, and the binary attachments can deliver additional benefits.
The PDF/A-3 extended container capabilities will make this format attractive in new areas, for example when a graphical representation of a document should be combined with some source data. The new e-invoice format defined by the Forum for Electronic Invoices Germany (FeRD) is based on PDF/A-3 and XML.
Since Release 3 of FineReader Engine 11 the API is extended, so that included files/attachments can be extracted and also be added to a PDF.
ABBYY is a worldwide member in the PDF/A Competence Center and committed to support PDF/A