Community

Paragraph capturing

Hello,

I am trying to capture address in a document. So I search the zip code first N{5}[\-]N{4}, then get the 2 lines closes to the zip code. Not ideal, but for most of the document it should do the trick. However, I can only get 1 line instead of 2. Or if I increase it to 3 or 4 line then I can capture all of them. How do I make sure that it get only the last 2 lines? Please help.

Thanks

Was this article helpful?

0 out of 0 found this helpful

Comments

10 comments

  • Avatar
    Katja Ovcharova
    Hello lliu,

    As you specified in the element settings that from 1 to 2 lines are allowed for CustomerAddress, FlexiCapture may take any reliable text within the search area (which is now includes three text lines).
    If you need to take 2 last lines exactly, then try setting Min line count to 2. Probably setting up the search area below the CustomerName element + the height of the next line ("FEEE") would also help you to improve the result.
    0
  • Avatar
    lliu
    Setting Customer name first doesn't work because it itself can be multiple line in this case. I tried setting it to Min 2, but it just captures the Customer Name + one line below it. Why is this happening. The closest element setting is not doing it's trick. Is this a bug?
    0
  • Avatar
    Katja Ovcharova
    lliu, do I understand right that now you have troubles with capturing CustomerName? Or you get CustomerName string captured into CustomerAddress element? If so, then you could try setting search relations more accurate so CustomerName is not included into CustomerAddress region.
    Please let me note that standard FlexiLayout distribution includes sample named invoices where logic of accurate address searching is described for typical invoice. This sample is also described in FlexiLayout Studio Help, section Tutorial->Sample 3. Please try looking into documentation as many useful tips can be found there.
    If would like forum users to assist you with your flexilayout please at least provide us with a real image (or a couple of images, any sensitive information can be replaced by some "fake" text there) and specify what information remains unchanged for real docs and can be used as reference to extract data. It would also be good to know what FlexiCapture version you are currently using.
    0
  • Avatar
    lliu
    Version: 11.0.2.1435

    I am trying to capture address and customer name. Nothing is static, the name can be 2 lines long so I am using paragraph for the name as well. Which means I should get address first, then anything above address is customer name. I found zipcode but can't get 2 lines closest to zipcode. I also tried capturing address number on street but there is possibility that the customer name contains number as well. The one thing that I don't understand is why is it not getting the 2 lines closest to zipcode when that's what I specified, if it's not a bug.
    0
  • Avatar
    Katja Ovcharova
    lliu,

    no it's not a bug. It seems that FlexiLayout studio has troubles with capturing CustomerName element (e.g. set relations are not enough to identify it reliably) and this affects CustomerAddress search quality.

    I set up a flexible description with your sample image without creating CustomerName element, only kwZip, and FLS captured two lines nearest to ZIP into CustomerAddress successfully (see attach). Please note that I have additionally excluded ZIp-code region from the CustomerAddress search area.

    So you can try setting CustomerAddress parameters without reference to CustomerName. If you want to get CustomerName too, then please try setting it up more carefully. Again, looking into standard samples may help you to build flexilayout properly (e.g., there you can see how white gaps and separators are used to find text parts in the unstructured invoice).
    0
  • Avatar
    lliu
    The main focus is getting the address. Customer name can be captured later. This is a work around and I guess I can stitch the "address" and the zipcode together afterward. But please be noted that the positioning can be different so I can't just specify a specific search area.

    But let's say if I don't want customer name, just the address. Then I think it's a bug for why it doesn't capture the 2 closest lines, unless the zipcode is omitted.
    0
  • Avatar
    Katja Ovcharova
    But let's say if I don't want customer name, just the address. Then I think it's a bug for why it doesn't capture the 2 closest lines, unless the zipcode is omitted.

    Again, it isn't. The line including some element on the right is the nearest line to this element, isn't it? If you would like to exclude ZIP-code from the search you could either do like I suggested before, or additionally specify that you are looking for address to the left of the ZIP-code (i.e. add Left of relation).

    0
  • Avatar
    lliu
    No, I do want to capture zipcode. I simply meant the screenshot from your sample is excluding the zipcode. Is there way to include the zipcode?
    0
  • Avatar
    Katja Ovcharova
    lliu,
    from your previous explanations I understood that you search ZIP separately, sorry if I got it wrong. Of course you can capture ZIPcode within the address, just remove the corrsponding search constraint. In my test project I specified kwZip in a field "Exclude regions of elements", Search Constraints tab, so if I remove this restriction, ZIP is included into CustomerAddress element.
    0
  • Avatar
    lliu
    Sorry about the confusion. And here is something very interesting in my sample. I had the Below Customer Name, and in that case, it captures wrongly. But if I remove it like yours, then it's correct. Anyways, that works. Thank you. Still not sure why this kind of thing is happening when the logic makes sense but the result is not.
    0

Please sign in to leave a comment.