Extract numeric data from invoices in georgian language Answered

Dear all,

We have a complex problem without a good solution. We have now ABBYY Flexicapture version desktop.

We have many invoices in Georgian language. As ABBYYY don't support Georgian as recognition language. We don't have any possibility to extract text.

But, can we still extract numeric information ? We can define some regular expression on Character String to find out singleton elements, but how can we extract rows in tables ? Remark that some pages, Pre-Recognize can't detect separator lines between rows and columns (green lines in capture).

alt text

We think about using lower level ABBYY FineReader API to retrieve information about detected occurrences with information on positions (X,Y) pixels, then we can associate items on same row by considering approximation on X values.

Do you have any better idea ?

If we use ABBYYY FineReader API, which programming language and how can we process ? We prefer Java as mention in this page:

But, we don't find any jar files in current installation. Should we install more an other component ?

Thanks for your help.



1 comment

  • Avatar
    Oksana Serdyuk

    From your usage description you can try either our offline solution (ABBYY FineReader Engine) or our online service (ABBYY Cloud OCR SDK). In both cases you can get the recognition result with the characters coordinates.

    As for using with Java, the current version of FineReader Engine should include the com.abbyy.FREngine.jar file in the distribution package. And Cloud OCR SDK, as an online service, can be used in any environment supporting communication over the network.

    Please contact your region sales to clarify more details about these products. They will help you to make a choice between them and provide with a trial license if you want.


Please sign in to leave a comment.