Community

Calculating Confidence values (CC, WC, PC)

Hi,
We are using the API to add Confidence values in the ALTO XML output. The API provides access to 'total number of characters' and 'total number of uncertain characters,' however we are not sure how to calculate CC (Character Confidence), WC (Word Confidence) and PC (Page Confidence) using these parameter values or any other way in RS. There are some online resources (eg, https://github.com/altoxml/schema/issues/23) but I am unsure how reliable those are.

I would appreciate any help you can provide.

Thanks!

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Avatar
    Anastasiya Nechaeva

    Hello SAU,

    In order to understand how to get information about uncertain characters you may use the following code sample.

    The following code snippet should be placed into Document separation script in corresponding tab of your workflow setting.

    // we will use standard file system object to save our data
    var fso = new ActiveXObject("Scripting.FileSystemObject");
    //Please specify your location for file name and create the folder
    var statisticFilePath = "C:\\Temp\\stat.txt";
    var statisticFile = fspenTextFile( statisticFilePath , 2, true, -1 );
    statisticFile.WriteLine( "!!!!!! START !!!!!!");
    statisticFile.WriteLine( "File:" + this.InputFileProperties.FileName + "PageIndex:" + this.PageIndex);
    // you can get info about count from Page statistics
    statisticFile.WriteLine("Total characters count:" + this.Statistics.TotalCharacters + " Uncertain characters count:" + this.Statistics.UncertainCharacters);

    //Let's try to calculate the statistics manually.
    //Retrieve TEXT blocks from the RecognizedPage object
    var blocks = this.TextBlocks;
    var totalCharactersCount = 0;
    var uncertainCharactersCount = 0;

    //Iterate text blocks
    for (var iBlock = 0; iBlock < blocks.count;="" iblock++="" )="">
    {
    var block = blocks.Item(iBlock);
    //Obtain paragraphs from a text block
    var paragraphs = block.Paragraphs;
    //Iterate paragraphs
    for (var iPar = 0; iPar < paragraphs.count;="" ipar++)="">
    {
    var paragraph = paragraphs.Item(iPar);
    //Obtain Words from paragraph
    for (var iWord = 0; iWord < paragraph.words.count;="">
    {
    var word = paragraph.Words.Item(iWord);
    //Obtain Chars from Word
    for (var iChar = 0; iChar < word.characters.count;="">
    {
    if (word.Characters.Item(iChar).IsSuspicious)
    {
    uncertainCharactersCount++;
    }
    totalCharactersCount++;
    }
    }


    }
    //TO DO add the same logic for Table Blocks
    }
    statisticFile.WriteLine("Total characters count calculated for each word in TEXT blocks only:" + totalCharactersCount + " Uncertain characters count:" + uncertainCharactersCount);
    statisticFile.Close();

    0

Please sign in to leave a comment.