This issue is only relevant for FineReader Engine 10 and below.
Description
All the curly quote’s are recognized as a normal straight quote. How can I extract them?
Solution
The curly quote is recognized as a straight quote (Single/Double) in ABBYY FineReader Engine 10 by design. Please note this behavior has been changed in the ABBYY FineReader Engine 11 and the curly quotes are recognized as curly ones.
In order to obtain curly quotes, you can iterate the layout and change all straight apostrophes (characters with code U+0027) to right quotation marks (curly quote) after recognition.
Sample code (VB)
Dim params As FREngine.PageProcessingParams
params = Engine.CreatePageProcessingParams()
Dim tl As FREngine.TextLanguage
tl = Engine.CreateTextLanguage()
tl.CopyFrom(Engine.PredefinedLanguages.FindLanguage("English").TextLanguage)
tl.LetterSet(FREngine.TextLanguageLetterSetEnum.TLLS_Prefixes) = ChrW(8220)
tl.LetterSet(FREngine.TextLanguageLetterSetEnum.TLLS_Suffixes) = ChrW(8221)
params.RecognizerParams.TextLanguage = tl
Dim m_ImageDoc As FREngine.ImageDocument
m_ImageDoc = FRDocument.Pages.Item(0).ImageDocument
Dim m_Layout As FREngine.Layout
m_Layout = Engine.CreateLayout
Dim doc_info As FREngine.DocumentInfo
doc_info = Engine.CreateDocumentInfo
Dim spfp As FREngine.SynthesisParamsForPage
spfp = Engine.CreateSynthesisParamsForPage
Engine.AnalyzeAndRecognizePage(m_ImageDoc, params, spfp, m_Layout, doc_info)
Dim iblock As Integer
Dim iParag As Integer
Dim paragraph As FREngine.Paragraph
Dim Str As String
Dim ichar As Integer
Dim charparams As FREngine.CharParams
charparams = Engine.CreateCharParams()
Dim Character As String
For iblock = 0 To m_Layout.Blocks.Count - 1
If m_Layout.Blocks.Item(iblock).Type = FREngine.BlockTypeEnum.BT_Text Then
For iParag = 0 To m_Layout.Blocks.Item(iblock).GetAsTextBlock.Text.Paragraphs.Count - 1
paragraph = m_Layout.Blocks.Item(iblock).GetAsTextBlock.Text.Paragraphs.Item(iParag)
Str = paragraph.Text
For ichar = 0 To paragraph.Length - 1
paragraph.GetCharParams(ichar, charparams)
Character = Mid(Str, ichar + 1, 1)
If Character = "'" Then
paragraph.Remove(ichar, -1)
paragraph.Insert(iChar, ChrW(8217), charparams)
End If
Next ichar
Next iParag
End If
Next iblock
Comments
0 comments
Please sign in to leave a comment.