Curly quotes

This issue is only relevant for FineReader Engine 10 and below.

Description

All the curly quote’s are recognized as a normal straight quote. How can I extract them?

Solution

The curly quote is recognized as a straight quote (Single/Double) in ABBYY FineReader Engine 10 by design. Please note this behavior has been changed in the ABBYY FineReader Engine 11 and the curly quotes are recognized as curly ones.

In order to obtain curly quotes, you can iterate the layout and change all straight apostrophes (characters with code U+0027) to right quotation marks (curly quote) after recognition. 

Sample code (VB)

Dim params As FREngine.PageProcessingParams
params = Engine.CreatePageProcessingParams()

Dim tl As FREngine.TextLanguage
tl = Engine.CreateTextLanguage()

tl.CopyFrom(Engine.PredefinedLanguages.FindLanguage("English").TextLanguage)
tl.LetterSet(FREngine.TextLanguageLetterSetEnum.TLLS_Prefixes) = ChrW(8220)
tl.LetterSet(FREngine.TextLanguageLetterSetEnum.TLLS_Suffixes) = ChrW(8221)
params.RecognizerParams.TextLanguage = tl

Dim m_ImageDoc As FREngine.ImageDocument
m_ImageDoc = FRDocument.Pages.Item(0).ImageDocument

Dim m_Layout As FREngine.Layout
m_Layout = Engine.CreateLayout
Dim doc_info As FREngine.DocumentInfo
doc_info = Engine.CreateDocumentInfo
Dim spfp As FREngine.SynthesisParamsForPage
spfp = Engine.CreateSynthesisParamsForPage


Engine.AnalyzeAndRecognizePage(m_ImageDoc, params, spfp, m_Layout, doc_info)
Dim iblock As Integer
Dim iParag As Integer
Dim paragraph As FREngine.Paragraph
Dim Str As String
Dim ichar As Integer
Dim charparams As FREngine.CharParams
charparams = Engine.CreateCharParams()
Dim Character As String

For iblock = 0 To m_Layout.Blocks.Count - 1
If m_Layout.Blocks.Item(iblock).Type = FREngine.BlockTypeEnum.BT_Text Then
For iParag = 0 To m_Layout.Blocks.Item(iblock).GetAsTextBlock.Text.Paragraphs.Count - 1
paragraph = m_Layout.Blocks.Item(iblock).GetAsTextBlock.Text.Paragraphs.Item(iParag)
Str = paragraph.Text
For ichar = 0 To paragraph.Length - 1
paragraph.GetCharParams(ichar, charparams)
Character = Mid(Str, ichar + 1, 1)
If Character = "'" Then
paragraph.Remove(ichar, -1)
paragraph.Insert(iChar, ChrW(8217), charparams)
End If

Next ichar
Next iParag
End If
Next iblock

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Recently viewed