Navigation Menu+

PDF to PDF/A Converter Evaluation

Posted in Research

  • Project Type: Internship
  • Deliverable: Evaluation of PDF to PDF/A Converter Software
  • Client: Florida Virtual Campus (f/k/a Florida Center for Library Automation)
  • Status: Completed

The findings from this internship project were presented at the iPRES 2012 conference as a poster presentation, which was awarded the Best Poster Award. Please find the Conference Proceedings here.

A full-length journal article on this project was published in May of 2013 on the New Review of Information Networking. Click here for more information.

 

PDF/A is a version of the Portable Document Format standards that is designed for archiving and preservation of electronic documents. In the spring of 2012, I conducted an evaluation of several of the available PDF to PDF/A converter applications on behalf of the Florida Digital Archive (FDA), maintained by the then Florida Center for Library Automation. There is room for interpretation in the ISO standards concerning PDF/A, which can be manifest in software. In selecting a PDF to PDF/A converter product, the reliability of the outcome in terms of PDF/A compliance must be evaluated along with the functionality.

The selection criteria had general applicability, although some requirements, such as Linux support and command line operation, were FDA specific. 8 products were identified from the PDF/A Competence Center on the PDF Association website, of which I selected 3 for in-depth evaluation after a thorough review of product specifications.

 

The evaluation consisted of three phases:

  1. Preliminary validation testing of all products on a test suite from 2009 Bavaria Report
  2. Preliminary conversion testing of all products on a small number of sample PDF files
  3. Full conversion testing on around 200 PDF files sampled from the archive

The full conversion testing of each product involved pre-conversion validation, conversion, self-revalidation on output files, and cross-revalidation by the other two products. All 3 products performed automatic fixes during conversion. Therefore thorough documentation and automated reporting of the fixes and fatal errors was found crucial. Not surprisingly, each product showed 100% compliance on its own output files, which is why the cross-revalidation of output PDF/A files by the other products proved essential in accurately assessing the quality of converted files.

The goal of the project was to allow the FDA to make an informed decision through a competitive evaluation. The FDA did purchase pdfaPilot based on the recommendation from this evaluation.

The goal of the presentation of the findings at iPRES 2012, as well as the article pending publication, however, is not to rank or promote the software evaluated, but rather to document the FDA’s evaluation process and present the results in such a way that they provide insight into challenges and potential issues when doing this sort of evaluation.

 

Please see the 2-page abstract and the poster for more details.

iPRES 2012 Abstract   iPRES 2012 Poster

 

A special thanks should be given to my mentors, Carol Chou and Priscilla Caplan, whose support and guidance was absolutely crucial in the success of this project.