How to import ready OCR text (XML or TXT) in a PDF?

Hello!

Following scenario: I have an image-only PDF file (a scanned book) with 500 pages and 500 Alto-XML files with OCR-Text for each corresponding page of that PDF File. That OCR-XML files were exported from the original searchable OCR-PDF-file. I don't have that source OCR-PDF file, it comes from a German library (StaBi Berlin). Unfortunately, they don't offer to download the OCR-PDF file directly. You can just download an image-only PDF file of a book and the corresponding OCR-XML-files from separately. Or all OCR-Text in one txt-file. (If you don't believe me, see for yourself: See here You can change the language to english on the bottom right corner)

So now I am looking for a way to import those 500 XML-Files back to each corresponding page of that image-only PDF so that I get a searchable OCR-PDF file in the end. Is there a way to do it with Finereader (or, if not, maybe with assistant tools?)

Best regards,
Minsutoreru

Community

Was this article helpful?

Comments

Didn't find what you were looking for?