Ticket #201 (new defect)

Opened 5 years ago

Last modified 7 months ago

Problem with digits in the words

Reported by: tempread@… Owned by: decoder
Priority: major Milestone: Development Release Version 3.4
Component: Image Analysis Version: 3.4
Keywords: Cc:

Description

Using FuzzyOcr? plugin, version 3.4

I try detect phone numbers with "fuzzyocr". Add some numbers to wordlist. But have such lines in log: 2007-02-28 17:35:32 [67664] Found word "" in line

"" with fuzz of 0.0000 scanned with scanset $gocr -i $pfile

2007-02-28 17:35:32 [67664] Found word "" in line

"" with fuzz of 0.0000 scanned with scanset $gocr -i $pfile

2007-02-28 17:35:32 [67664] Found word "" in line

"" with fuzz of 0.0000 scanned with scanset $gocr -i $pfile

2007-02-28 17:35:32 [67664] Found word "" in line

"" with fuzz of 0.0000 scanned with scanset $gocr -i $pfile

2007-02-28 17:35:32 [67664] Found word "" in line

"" with fuzz of 0.0000 scanned with scanset $gocr -i $pfile

Problem exist at the next lines:

number 1137: $w =~ s/[a-z]//g;

and number 1144: s/[a-zA-Z]//g;

Why we restrict wordlist only in lower case alfabetical symbol?(at line 1137)

As for me(for resolving problem with detectinf phone numbers), i was commented line 1137: w =~ s/[a-z]//g;

and replace line 1144 from s/[a-zA-Z]//g; to s/[a-zA-Z0-9]//g; (but i think that we can comment this line too)

Change History

Changed 5 years ago by tempread@…

(i mean FuzzyOcr?.pm)

Changed 3 years ago by anonymous

I've got the same problem with version 3.5.1. Recently we received lots of png spam. I was able to scan for viagra etc but now they changed the font to counter ocr. I would like to scan for the prices for these pills like $1.89 . If i add this to the wordlist, I'm getting the same problems.

Note: See TracTickets for help on using tickets.