
Managing Back-Catalogue Conversion with ISO: A Comprehensive Project
Discover how ISO, the independent NGO with 163 national standards bodies, undertook the daunting task of converting over 30,000 standards into XML format in two years. Learn about the challenges, solutions, and the impact of this project on global standardization efforts.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Beware of the laughing horse Managing a back-catalogue conversion Laurent Galichet and Holger Apel
Who is ISO and what do we do? Independent member-based NGO 163 national standards bodies Bring together experts to develop voluntary consensus-based international standards
ISO in numbers (2016) 21478 International Standards and standards-type documents published Nearly 1 million pages (EN and FR) 3555 technical bodies 1509 technical meetings in 45 countries 144 ISO staff coordinate the worldwide activities of ISO from Geneva
Why the XML project? Creation of a central repository of standards Improve speed to market also for national adoptions Broaden readership Reduce or avoid duplication of costs Streamline ISO production processes
The back catalogue project Convert over 30 000 standards (750 000 pages; EN, FR, RU, SP) into XML in two years Cornerstone of XML project
The Online Browsing Platform https://www.iso.org/obp
How do we mark up this thing? DTD choice NLM DTD chosen as base ISO was familiar with TEI Parallel project to build in-house production chain Tools assessed by ISO very familiar with NLM DTD
The solution Mulberry Technologies Customization of NLM DTD Metadata structure and elements Standards structure Standards-content elements Genesis of ISOSTS: ISO Standard Tag Set
Request for proposals Back-catalogue conversion project RFP launched April 2011 Two providers shortlisted SLAs and processes made difference Site visits by Top Management Selection of Innodata, with conversion team in Sri Lanka
The contract Two-year duration Quality criteria requested above industry levels at the time (98%) Negotiation for higher quality levels Pricing structure Project management structure
Price structure Price(s) Service(s) to be billed Conversion Processing (categorized by Source Type) Unit $xx PDF Scanned $xx Per page PDF Character-based $xx Per page MS Word $xx Per page SGML Per page Tables to CALS or XTHML(categorized by Source Type) $xx $xx PDF Scanned Per Kbytes $xx PDF Character-based Per Kbytes $xx MS Word / SGML Per Kbytes Per image Tables to PNG $xx Equations to MathML (categorized by Complexity level) $xx Per Equation Simple $xx Per Equation Moderate $xx Per Equation Complex $xx Per image Equations to PNG Per Image Images for Conversion
Estimates for budget Pages: 750000 Tables: 40000 Equations: 15000 Images: 76000 Additional 25% contingency
Service level agreements For the Text: 99.995% For the Tags: 99.95% All material marked up according to coding instructions Processed data is fully-parsed to zero errors against ISOSTS and Schematron rules
Set-up period Contractually 2 months Sample set of representative standards 12000 pages Iterative process: mark-up, check, refinement coding
End of set-up period ISO would confirm the final DTD and coding instructions The vendor would demonstrate ability to deliver the quantity and quality requested End of the set-up period, and prior to mass conversion, ISO could terminate the agreement if: The vendor had not demonstrated that it was capable of achieving the agreed upon quality requirements; The vendor and ISO were not able to reach mutual agreement on the operational arrangements with respect to the quality assurance (QA) methodologies to be applied during live production.
Milestones October-December 2011 Set-up period (coding instructions and DTD finalized) January-December 2012 Mass conversion with 30% of back- catalogue converted by year end January-December 2013 Mass conversion with full catalogue converted by year end
Theory is great The extracted text content shall not require proofreading or editing after extraction. It s digital to digital, what can go wrong? No need to check!
In practice set-up period Took 6 months 3 iterations of the full 12000 pages marked up, checked, feedback On-site visit to finalize coding instructions Meeting face-to-face invaluable Mass conversion to start May 1st2012
2011 Xmas is early! HTML rendering Schematron rules Holger you re the best!
Coding instructions http://www.iso.org/schema/isosts/
2011 It all begins ISOSTS development end refinement by Mulberry Coding instructions (ISO/Innodata) ISOSTS kick off ISO and Mulberry Tendering process for back catalog Back catalog setup period XML schema selection Innodata selected for back Catalog Elaboration of SLAs I join... ISOSTS v.0.6 lands 2011 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 Mass conversion ISOSTS development and refinement by Mulberry Coding instructions by ISO and Innodata Back catalog - mass conversion Online browsing platform ISOSTS v1.0 Back cat start mass conversion 2012 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Document management Initially batches of 625 pages from most recent First few batches contained only one large document Rethink of the batching Batches of short documents (less than 20 pages) Then up incrementally Chronologically back meant source: Word, cPDF, scanned PDF
Errors and batch rejection Contractually, unit of rejection is a batch Rejected batches get re-marked up in full Not all errors are equal Document flow and progress Let s be pragmatic and agile
Full pace Couple of months to reach 20000 pages per month delivered No systematic or systemic errors Document flow, proofing, corrections at document level Target of 30% English language documents on course Considering sampling approach to proofing Life is great!
Scanned PDFs January 2013 batches of scanned PDFs Quality of source material Page deliveries dropped to below 5000 per month Required rethink of process More resources Responsiveness of vendor faultless
Final straight Final quarter 2013 50000 pages per month 98% converted in terms of number of documents 80% converted in terms of pages Large documents completed first quarter 2014
Lessons learnt Don t believe it when vendors say 100% quality and 100% automation Don t accept it when your colleagues say it s digital to digital, what can go wrong? No need to check! Image/scanned PDFs are really, really tricky A prioritization of documents might be a good idea Don t forget potential future uses of your XML! Tags cost money And don t underestimate your content!
Lessons learnt Our estimates for the contract And for the actuals Pages Tables Equations Images 750 000 approx. 40 000 approx. 15 000 approx. 76 000 Pages Tables Equations Images 675 000 156 000 180 000 138 000
One more lesson QA QA QA And get some cash for proofing at the extraction stage!
Ah, another one! It s not just a contract it s a partnership
Conclusions Conversion completed 2014 Budget fully spent Quality criteria met All resources available on http://www.iso.org/schema/isosts/ Content available through online browsing platform https://www.iso.org/obp
Further thoughts ISOSTS JATS is comprehensive ISOSTS covers all the tagging needs to date for ISO Great to work with people who know their stuff! And everyone needs to be flexible including the tag set (sometimes!)
Thanks To Holger, Serge, Shannon, Claude, Caroline, Nicolas, Trevor, David, and the ISO Membership To Debbie, Tommie, Sadhik, Rizwan, Saumya and the whole team