Ticket #80 (new defect)

Opened 2 years ago

Last modified 1 year ago

ocr fuzzy-matching misclassifying "committment" as "million"

Reported by: jason.haar@trimble.co.nz Assigned to: decoder
Priority: minor Milestone:
Component: Image Analysis Version: 3.5.1
Keywords: Cc:

Description

We got a FP today where a mail containing a JPG of a scanned page got pinged as spam by FuzzyOCR.

There were no drug/stocks related words in it, but fuzzyocr found "million", "thousands" and "alert" in it anyway. Must have been via fuzzy-text matching?

I realise this is one of the risks, but maybe there's some way of improving that?

BTW "gocr -l 180 -d 2 -i " pulled almost the entire paragraph from the page word-perfect (pretty good tool that :-)

Thanks

Jason

Attachments

Change History


Add/Change #80 (ocr fuzzy-matching misclassifying "committment" as "million")