コミュニティ

No table recognition 回答済み

Hello,

 

I have a problem with my parameters in the FineReader Engine. The Engine don't detect tables in our files, only the text. But I don'f find the problem in the parameters. We don't use a profile.

 

Our parameters are:

private FREngine.DocumentProcessingParams setDPP() {
            FREngine.DocumentProcessingParams processingParams = engineLoader.Engine.CreateDocumentProcessingParams();

            engineLoader.Engine.MultiProcessingParams.MultiProcessingMode = FREngine.MultiProcessingModeEnum.MPM_Parallel;
            engineLoader.Engine.MultiProcessingParams.RecognitionProcessesCount = Environment.ProcessorCount;

            processingParams.PageProcessingParams.RecognizerParams.LowResolutionMode = true;
            processingParams.PageProcessingParams.RecognizerParams.OneWordPerLine = true;
            processingParams.PageProcessingParams.RecognizerParams.DetectLanguage = false;
            processingParams.PageProcessingParams.RecognizerParams.TextLanguage = fillLangDatabase();
            
            processingParams.PageProcessingParams.PageAnalysisParams.DetectPictures = false;
            processingParams.PageProcessingParams.PageAnalysisParams.DetectVectorGraphics = false;
            processingParams.PageProcessingParams.PageAnalysisParams.DetectVerticalEuropeanText = true;
            processingParams.PageProcessingParams.PageAnalysisParams.EnableTextExtractionMode = false;
            processingParams.PageProcessingParams.PageAnalysisParams.DetectTables = true;
            processingParams.PageProcessingParams.PageAnalysisParams.AggressiveTableDetection = true;
            

            processingParams.PageProcessingParams.PagePreprocessingParams.CorrectShadowsAndHighlights = FREngine.ThreeStatePropertyValueEnum.TSPV_Yes;

            processingParams.PageProcessingParams.ObjectsExtractionParams.RemoveGarbage = true;
            //processingParams.PageProcessingParams.ObjectsExtractionParams.EnableAggressiveTextExtraction = true;
            //processingParams.PageProcessingParams.ObjectsExtractionParams.DetectTextOnPictures = true;
            
            return processingParams;
        }

...

      Document.Process(setDPP());

 

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

6件のコメント

  • Avatar
    Permanently deleted user

    Hi

    My only suggestion is, that it could have something to do with the OneWordPerLine parameter.
    What's the result when adjusting that to false ?

    From the help:
    OneWordPerLine  VARIANT_BOOL
    This property set to TRUE tells ABBYY FineReader Engine to presume that no text line may contain more than one word, so the lines of text will be recognized as a single word. By default this property is FALSE.

    Otherwise, try narrowing down by removing each property, which then automatically uses the default.

    Best regards
    Koen de Leijer


    0
  • Avatar
    Permanently deleted user

    Please also try only to use the DocumentConversion_Accuracy predefined profile without all your settings. Is the table block detected? If not, please send your source image to SDK_Support@abbyy.com for testing on our side. Thank you!

    0
  • Avatar
    Permanently deleted user

    Hello, everyone,

     

    thank you for your help, unfortunately that didn't help either. The current status is very curious:

    Original PDF File -> Tables detected

    PDF to BMP -> no tables are recognized in the BMP

    A section of the BMP (with Paint) and the tables are recognized.

    Is there a maximum pixel and file size for images?

     

    Br

    Tobi

    0
  • Avatar
    Permanently deleted user

    This is possible, because during converting from PDF to BMP the resolution of the your document could have changed, as a result you get different recognition results. You can try to change resolution of your BMP file via API: Developer's HelpAPI Reference Image-Related Objects PrepareImageMode → the Resolution overwriting section. If you want to get additional recommendations, kindly send your PDF and BMP files to your region ABBYY Technical Support Team for further investigating the issue.

    0
  • Avatar
    Permanently deleted user

    That's it! Thanks for the solution. 

    But is there another way to create a image from the pdf? Or can I increase the resolution? I get a ~100dpi image resolution from a 300dpi pdf

    My code:

    FREngine.ImageModification imgMod = engineLoader.Engine.CreateImageModification();
    FREngine.IHandle hBitmap = ImgDocument.ColorImage.GetBitmap(imgMod);
    Bitmap image = Bitmap.FromHbitmap(hBitmap.Handle);
    0
  • Avatar
    Permanently deleted user

    Hi Tobi,

    We have successfully downloaded your documents, thank you!

    You can convert your PDF file to an image format by using the FREngine API, i.e. use the WriteToFile method of the Image object as shown below in the C# code snippet:

    ...
    // Add image file to document document.AddImageFile( imagePath, null, null ); document.Pages.Item(0).ImageDocument.ColorImage.WriteToFile(imagePath + ".bmp", FREngine.ImageFileFormatEnum.IFF_BmpColorUncompressed, null, null);

    You can choose any image format listed among the ImageFileFormatEnum values.

    Also it is possible to change the resolution of your the image during image preprocessing stage and before recognition. You can also do it via the FREngine API by using the additional PrepareImageMode settings:

    ...
    // Add image file to document document.AddImageFile( imagePath, null, null );
    FREngine.PrepareImageMode pim = engineLoader.Engine.CreatePrepareImageMode(); pim.AutoOverwriteResolution = false; pim.OverwriteResolution = true; pim.XResolutionToOverwrite = 300; pim.YResolutionToOverwrite = 300;
    0

サインインしてコメントを残してください。