Case Study

Transforming High-Value Legal Content

A British publisher that specializes in legal publications and offers a variety of formats, including books, journals, periodicals, loose leafs, CD-ROMs, and online services, the client was looking to digitize its content.

To enhance document management for easy retrieval, the client was looking to digitize its data of the various law books, journals, and loose leafs. In addition, it also required data capture and XML markup.

  • To digitize over 3,000 books, journals, and loose leafs every month by adding markups to help conserve important legal documents
  • To economically digitize textual content from hardcopies into an electronic format using improvised XML naming conventions assigned to each title
Solution and Approach

Lumina Datamatics was approached by the client because of its proven expertise in accurate and cost-effective data conversion.
In the recent past, Lumina Datamatics has executed similar projects for some of the leading global publishing houses, government agencies, and academic institutions.

Lumina Datamatics implemented the following solution:

  • Data extraction from PDF/MS Word files
  • XML markup with journal DTD

Lumina Datamatics set down the following approach:

  • Data were captured with highest accuracy using our in-house data extraction and verification tool
  • Using client-specified DTD, a toolset to do over 90% auto markup in XML was developed
  • Quality check of the validations was performed using XML parser
  • Over 99.995% accuracy was strictly maintained


Lumina Datamatics accurately used all the client’s parameters into the applicable format. It helped the client by increasing the value of its existing legal library content by converting PDF/MS Word files into XML output using appropriate DTD to 100% parsed output.