<p dir="ltr">Calibration reports are essential for traceability and quality assurance in metrology. Yet, their common storage as scanned PDFs severely limits digital usability. This paper introduces a modular optical character recognition (OCR) and natural language processing (NLP) pipeline to digitize and structure such reports into JSON format. The pipeline integrates layout-aware text detection using PaddleOCR, robust table extraction via PP-StructureV2 with TableMaster, and domain-specific named-entity recognition using SciBERT fine?tuned on calibration-specific terminology. Tested on real-world calibration reports for single-axis and roundness measuring machines, the pipeline achieved a median processing time of 95 seconds per report, a 48% reduction compared to manual transcription. A Flask-based front end enables data verification, while a MongoDB database supports flexible querying and trend analysis. These features collectively deliver quantifiable improvements in processing speed, structured data quality, and traceability for metrology operations.</p>
History
Journal/Conference/Book title
IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)