コミュニティ

junk chars are detected

mpraining: Some junk chars are detected, e.g.: "PINOT NOIR" - this is the first line of the result of the attached image. Another one "Joan d’Anguera". Here we need the text after such junk char removed. So is there any option to avoid such characters?

Image

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

3件のコメント

  • Avatar
    Permanently deleted user

    The issue is not reproduced on our side. We recommend to recognize your image with the URL "http://cloud.ocrsdk.com/processImage?language=english,french&profile=textextraction&exportFormat=txt". In this case the result is

    PINOT NOIR
    BURGUNDY
    A1020 Roblet-Monnot “Vieilles Vignes" 2010
    72
    Al 021 Paul Pernot et ses Fils 2008
    122
    Pommard-Noizons
    A1022 Domaine Antonin Guyon 2009
    Clos de la Chaume Gaufriot, Beaune
    A1023 Domaine Ardhuy 2009
    Gevrey-Chambertin
    U5
    172
    C10-24 Domaine de Lambrays Grand Cru 2009
    Clos des Lambrays, Morey
    260
    C1025 Camille Giroud Grand Cru 2008
    Chapelle-Chambertin
    430
    an 18% gratuity is included on all checks
    
    0
  • Avatar
    Permanently deleted user

    Hello Anastasia, Thanks for your feedback, I got it working better, but still there is one thing I do not understand is that, please check the following entry which I got from my result

    A1022 Domaine Antonin Guyon 2009
    Clos de la Chaume Gaufriot, Beaune
    A1023 Domaine Ardhuy 2009
    Gevrey-Chambertin
    145
    172
    

    Here actually, we expect something like this,

    A1022 Domaine Antonin Guyon 2009
    Clos de la Chaume Gaufriot, Beaune
    145
    A1023 Domaine Ardhuy 2009
    Gevrey-Chambertin
    172
    

    But result is not fine, can you please check why this is happening otherwise my algorithm to detect this line will fail due to this OCR mistake. And I checked the xml format, that is not suitable for us. I'm just expecting the contents as in the image. Please check and help me.

    0
  • Avatar
    Permanently deleted user

    The automatic analysis recognize this picture as several separate areas, that's why the text order is not from left to right and from top to bottom. Unfortunately, now it's impossible to export text in this order automatically. So the only way to get this order is to sort the words using its coordinates on your side.

    0

サインインしてコメントを残してください。