| 6 | | netpbm: |
|---|
| 7 | | http://sourceforge.net/project/showfiles.php?group_id=5128 |
|---|
| 8 | | |
|---|
| 9 | | gifsicle: |
|---|
| 10 | | http://www.lcdf.org/gifsicle/gifsicle-1.44.tar.gz (latest) |
|---|
| 11 | | |
|---|
| 12 | | gocr: (v0.40) suggested ... needs to be patched! |
|---|
| 13 | | http://sourceforge.net/project/showfiles.php?group_id=7147 |
|---|
| 14 | | |
|---|
| 15 | | ocrad: |
|---|
| 16 | | Please use your closest GNU mirror: |
|---|
| 17 | | http://www.gnu.org/prep/ftp.html |
|---|
| 18 | | |
|---|
| 19 | | mysql: |
|---|
| 20 | | http://www.mysql.com (should work with 3.23+) |
|---|
| 21 | | |
|---|
| 22 | | Perl Packages: |
|---|
| 23 | | ~~~~~~~~~~~~~~ |
|---|
| 24 | | String::Approx |
|---|
| 25 | | |
|---|
| 26 | | MLDBM - used with type 2 hasing |
|---|
| 27 | | Storable - used with type 2 hasing |
|---|
| 28 | | DB_File - used with type 2 hasing |
|---|
| 29 | | |
|---|
| 30 | | DBI - used with type 3 hasing |
|---|
| 31 | | DBD::mysql - used with type 3 hasing |
|---|
| 32 | | |
|---|
| 33 | | Make shure all the above requirements are met, or else!!! |
|---|
| 34 | | I personally think it is better to compile all from source, but |
|---|
| 35 | | binary packages are available if you decide to go that way. The |
|---|
| 36 | | only package that should be compiled from source is gocr (since |
|---|
| 37 | | it requires some patching to make it work better ;) |
|---|
| 38 | | |
|---|
| 39 | | Place a copy of the following files in your SpamAssassin local |
|---|
| 40 | | configuration directory (/etc/mail/spamassassin by default): |
|---|
| 41 | | |
|---|
| 42 | | FuzzyOcr.pm |
|---|
| 43 | | FuzzyOcr.cf (change to taste) |
|---|
| 44 | | FuzzyOcr.words |
|---|
| 45 | | |
|---|
| 46 | | Skipping Scans |
|---|
| 47 | | ~~~~~~~~~~~~~~ |
|---|
| 48 | | Due to possible false positives, you also have the option not to |
|---|
| 49 | | scan a particular type of image using the following configuration |
|---|
| 50 | | option: |
|---|
| 51 | | |
|---|
| 52 | | focr_skip_<img_type> 1 |
|---|
| 53 | | |
|---|
| 54 | | Optionally you could skip scanning of images that are 'too big' by |
|---|
| 55 | | specifying the following configuration option: |
|---|
| 56 | | |
|---|
| 57 | | focr_max_size_<img_type> <max-size> |
|---|
| 58 | | |
|---|
| 59 | | where <max-size> is expressed in bytes (compared to the pnm |
|---|
| 60 | | filesize), and <img_type> is one of the following: |
|---|
| 61 | | |
|---|
| 62 | | - gif |
|---|
| 63 | | - jpeg |
|---|
| 64 | | - png |
|---|
| 65 | | - bmp |
|---|
| 66 | | - tiff |
|---|
| 67 | | |
|---|
| 68 | | Timeouts |
|---|
| 69 | | ~~~~~~~~ |
|---|
| 70 | | There are two types of timeouts available for FuzzyOCR: |
|---|
| 71 | | |
|---|
| 72 | | 1.- Per Application Timeout (Default) |
|---|
| 73 | | Set by setting the following: |
|---|
| 74 | | |
|---|
| 75 | | focr_timeout <secs> |
|---|
| 76 | | focr_global_timeout 0 (Default) |
|---|
| 77 | | |
|---|
| 78 | | Each external helper application is given <secs> seconds |
|---|
| 79 | | to complete, after which time it is assumed that it failed |
|---|
| 80 | | and processing continues. |
|---|
| 81 | | |
|---|
| 82 | | 2.- Global Timeout |
|---|
| 83 | | Set by setting the following: |
|---|
| 84 | | |
|---|
| 85 | | focr_timeout <secs> |
|---|
| 86 | | focr_global_timeout 1 |
|---|
| 87 | | |
|---|
| 88 | | If scanning takes longer than <secs> seconds, the scan is |
|---|
| 89 | | aborted and the images (if any) are not scored or checked. |
|---|
| 90 | | |
|---|
| 91 | | Image Hashing |
|---|
| 92 | | ~~~~~~~~~~~~~ |
|---|
| 93 | | If using image-hasing option (disabled by default) you need to specify |
|---|
| 94 | | the following options in FuzzyOcr.cf: |
|---|
| 95 | | |
|---|
| 96 | | focr_enable_image_hashing 1 |
|---|
| 97 | | focr_digest_db <full_path_to_file> |
|---|
| 98 | | |
|---|
| 99 | | or |
|---|
| 100 | | |
|---|
| 101 | | focr_enable_image_hashing 2 |
|---|
| 102 | | focr_db_hash <full_path_to_file> |
|---|
| 103 | | focr_db_safe <full_path_to_file> |
|---|
| 104 | | focr_db_max_days ## (default: 35) |
|---|
| 105 | | |
|---|
| 106 | | In either case, you need to make shure the effective user running |
|---|
| 107 | | SpamAssassin has the proper permissions to write to the specified files, |
|---|
| 108 | | or change permissions on the files so that the effective user has |
|---|
| 109 | | write permissions on these files. |
|---|
| 110 | | |
|---|
| 111 | | Now if you decide to store the data in MySQL tables, |
|---|
| 112 | | |
|---|
| 113 | | focr_enable_image_hashing 3 |
|---|
| 114 | | focr_db_max_days ## (default: 35) |
|---|
| 115 | | focr_mysql_db <database_name> (default: FuzzyOcr) |
|---|
| 116 | | focr_mysql_hash <hash_table> (default: Hash) |
|---|
| 117 | | focr_mysql_safe <safe_table> (default: Safe) |
|---|
| 118 | | focr_mysql_user <username> (default: fuzzyocr) |
|---|
| 119 | | focr_mysql_pass <password> (default: fuzzyocr) |
|---|
| 120 | | |
|---|
| 121 | | and |
|---|
| 122 | | |
|---|
| 123 | | focr_mysql_socket <path_to_socket> (default: undefined) |
|---|
| 124 | | or |
|---|
| 125 | | focr_mysql_host <hostname> (default: localhost) |
|---|
| 126 | | focr_mysql_port <mysql_port> (default: 3306) |
|---|
| 127 | | |
|---|
| 128 | | |
|---|
| 129 | | Test with: |
|---|
| 130 | | ~~~~~~~~~~ |
|---|
| 131 | | spamassassin --debug FuzzyOcr < path_to_email > /dev/null |
|---|
| 132 | | |
|---|
| 133 | | If you do not get errors, you are ready to go, and restart SPAMD which is |
|---|
| 134 | | the (*strongly*) recomended way to use this plugin. |
|---|
| 135 | | |
|---|
| | 3 | http://fuzzyocr.own-hero.net/wiki/Installation-3.5.x |
|---|