Welcome! Log In Create A New Profile

Advanced

WD26 - OCR

Posted by ArieM 
WD26 - OCR
July 02, 2021 11:31AM
Hi all,

I wonder if the new OCR functions in V26 are useable for scan and recognize incoming invoices as you see nowadays in accounting systems. Basicly to get the date, amount, vendor, invoicenumber and so on.

Arie
Argus
Re: WD26 - OCR
July 02, 2021 12:25PM
Depends if you want any kind of reliability...

I've worked with a colleague on such a system, and the only way to have a result you can trust is to use MULTIPLE ocr engines one after the other, and compare the results... In his case, he finally settled on 3 passes with 3 different engines, and when 2 of them at least were saying the same thing, he would accept the result as reliable.
Re: WD26 - OCR
July 02, 2021 02:39PM
Hi Arie,
WD is using tesseract engine for OCR

[github.com]

There you will find FAQ about quality.

In my experience any computer generated pdf is readable without problems
Scanned images mostly works fine but there can be problems as image/scanner/document quality can be very different.

BR,
Alen
Re: WD26 - OCR
July 02, 2021 02:54PM
Hi all,

I was wondering if such is possible with a .pdf file. Most of the time invoices are send in a .pdf form, so it would be nice to "read" the invoice
and no scanning needed. Has anybody tried this before?

Best regards,

Aad
Re: WD26 - OCR
July 03, 2021 08:54AM
Hi Aad,

pdftotext() is your friend.

regards Michael
Re: WD26 - OCR
July 03, 2021 10:34AM
Hi MIchael,

Thank you for the tip.

Best regards,

Aad



Edited 1 time(s). Last edit at 07/03/2021 10:34AM by AadG.
Re: WD26 - OCR
July 05, 2021 09:57AM
Hi Alen,

I also noticed WD is using Tesseract. Of course we can implement such libraries (there are others, free of paid) ourselves. I was hoping WD did some work for us, focusing on common use cases like the one I mentioned: read invoices.

I did a small test and OCRExtractText() gives you all text from the pdf or image. But there are no options yet to define named areas for a certain invoice to get let's say the invoicenumber.

There are however dozens of articles to implement Tesseract in C#, Java or Python. Like this
[www.pyimagesearch.com]

That doesn't look too hard and gives us all the options of course.

Arie
Re: WD26 - OCR
July 05, 2021 10:11AM
Hi Arie,

are you sure ?


[doc.windev.com]

You can define a polygone.

regards Michael
Re: WD26 - OCR
July 05, 2021 10:21AM
Mmh, missed that. Looks like it's possible indeedthumbs up

Arie
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: