Community

With an OCR skill an output text per pdf file

I want to OCR multiple pdf files in one transaction (API calls). The result is one output of all the pdf files combines. 

Is it possible to have an OCR output (eg. text file) per pdf file?

Or is it possible to see some indication in the text output file when next page of a pdf starts? 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

3 comments

  • Avatar
    Tatiana Dyu

    Hi Patrick, normally each file in the transaction is processed as a separate document until you are using for example splitter skill to merge documents. How exactly do you get a single output file? Do you use assembly activity in the Process Skill?

    0
  • Avatar
    Patrick van Hemert

    I just created a basic OCR skill in which I selected the languages, outputs etc. Just need all the text on a document.

    When I process (using API calls) one page pdf, it takes much longer than when processing a document with multiple pages. But when I process a multi page pdf, I get only one text file returned for all pages.

     

    0
  • Avatar
    Tatiana Dyu

    Hi Patrick, thank you for the clarification. It is expected behavior that one output file is generated for the whole multipage pdf. However, I would investigate why the performance for a 1-page document takes a long time. I will create a support ticket, and you will be contacted by the team.

    0

Please sign in to leave a comment.