| 1 |
version 2.3j: |
|---|
| 2 |
Fixed: |
|---|
| 3 |
sh: $efile: ambiguous redirect |
|---|
| 4 |
This message was being generated when using complex scansets, because |
|---|
| 5 |
the 'value' was only translated once. In complex scansets, this value |
|---|
| 6 |
may be specified multiple times. |
|---|
| 7 |
|
|---|
| 8 |
FuzzyOcr.cf |
|---|
| 9 |
Fixed outstanding errors. Variable mismatches are now fixed. |
|---|
| 10 |
|
|---|
| 11 |
FuzzyOcr.pm |
|---|
| 12 |
Trap ImageMagick errors better, and logs them. |
|---|
| 13 |
|
|---|
| 14 |
When processing Animated-GIF files, due to the algorithm, it is possible |
|---|
| 15 |
to discard all frames, leaving an empty image. Now, this special case |
|---|
| 16 |
is treated as a corrupt image, and triggers FUZZY_OCR_CORRUPT_IMG with |
|---|
| 17 |
$Score{corrupt} points (2.5 by default). |
|---|
| 18 |
|
|---|
| 19 |
Changed: |
|---|
| 20 |
Option: focr_personal_wordlist |
|---|
| 21 |
Now, if the option value begins with '/', the value is not treated as |
|---|
| 22 |
relative to the efective user's HOME directory, but as a fixed path. |
|---|
| 23 |
|
|---|
| 24 |
|
|---|
| 25 |
version 2.3i: |
|---|
| 26 |
Added: |
|---|
| 27 |
Option: 'focr_score_ham' Default: 0.0 |
|---|
| 28 |
When set to 1, images that are below the 'focr_counts_required' threshold, |
|---|
| 29 |
are scored with the formula: $Score{Add} * $cnt; this gives marginally bad |
|---|
| 30 |
images some positive score instead of just allowing them without score. |
|---|
| 31 |
|
|---|
| 32 |
Removed: |
|---|
| 33 |
Util: gif2anim |
|---|
| 34 |
This script is no longer used in the plugin, so it is removed from the |
|---|
| 35 |
distribution, although if needed, it may be found in the previous version. |
|---|
| 36 |
|
|---|
| 37 |
Fixed: |
|---|
| 38 |
The plugin was stuck in infinite loop in the case where there is more |
|---|
| 39 |
than one attachment with the same name. The tie-breaking was not working. |
|---|
| 40 |
|
|---|
| 41 |
When processing GIF files, extra care has to be taken so that ImageMagick |
|---|
| 42 |
properly recognizes the files as GIF images, otherwise, an error occurs |
|---|
| 43 |
because ImageMagick cannot properly determine the image 'type' and cannot |
|---|
| 44 |
determine the image size, resulting in an invalid hash. Code is now in place |
|---|
| 45 |
to prevent this, and in the case where invalid image size is encountered, |
|---|
| 46 |
the processing of this image is skipped. |
|---|
| 47 |
|
|---|
| 48 |
Changed: |
|---|
| 49 |
When the plugin determines that words from the lists are found in the images, |
|---|
| 50 |
it now stores these words in 'focr_db_hash' so that when we encounter the same |
|---|
| 51 |
image hash in another message, the report will add the words 'found' to the |
|---|
| 52 |
report, giving the end user more information, instead of just the |
|---|
| 53 |
FOCR_KNOWN_IMAGE_HASH rule firing with the previous score. |
|---|
| 54 |
|
|---|
| 55 |
version 2.3h: |
|---|
| 56 |
Require: |
|---|
| 57 |
New Perl Module |
|---|
| 58 |
Image::Magick; |
|---|
| 59 |
Added: |
|---|
| 60 |
Option: 'focr_anim_delay' Default: 100 |
|---|
| 61 |
This option is used with animated GIF files, and keeps all images |
|---|
| 62 |
that are displayed for at least 1 sec. |
|---|
| 63 |
|
|---|
| 64 |
Option: 'focr_anim_max_frames' Default: 2 |
|---|
| 65 |
This option is used with animated GIF files, and keeps top N |
|---|
| 66 |
largest frames. |
|---|
| 67 |
|
|---|
| 68 |
Fixed: |
|---|
| 69 |
Option: 'focr_digest_hash' |
|---|
| 70 |
Fixed internal parameter to reflect option from original plugin (Thanks Bill). |
|---|
| 71 |
|
|---|
| 72 |
Option: 'focr_db_hash' |
|---|
| 73 |
Updated FuzzyOcr.cf to reflect plugin option. |
|---|
| 74 |
|
|---|
| 75 |
Option: 'focr_db_safe' |
|---|
| 76 |
Updated FuzzyOcr.cf to reflect plugin option. |
|---|
| 77 |
|
|---|
| 78 |
Option: 'focr_counts_required' |
|---|
| 79 |
Fixed default value of '2' was set to '5' making it behave as the original plugin. |
|---|
| 80 |
|
|---|
| 81 |
Removed: |
|---|
| 82 |
Option: 'focr_bin_identify' |
|---|
| 83 |
Option: 'focr_bin_convert' |
|---|
| 84 |
These options are no longer valid, since the external programs are no longer called |
|---|
| 85 |
in favor of using PERL module. Makes things 'simpler'. |
|---|
| 86 |
|
|---|
| 87 |
Option: 'focr_bin_gifasm' |
|---|
| 88 |
Option: 'focr_bin_tifftopnm' |
|---|
| 89 |
external program not used anymore. |
|---|
| 90 |
|
|---|
| 91 |
Changed: |
|---|
| 92 |
The plugin now uses Image::Magick module to access ImageMagick functions from PERL instead |
|---|
| 93 |
of accessing external programs. This makes for fewer system calls to run external programs. |
|---|
| 94 |
(Idea from Eric Yiu) |
|---|
| 95 |
|
|---|
| 96 |
version 2.3g: |
|---|
| 97 |
Added: |
|---|
| 98 |
Option: focr_keep_bad_images |
|---|
| 99 |
The default value for this option is zero(0). |
|---|
| 100 |
When set to 1, the plugin will not remove a tempdir whenever it registers |
|---|
| 101 |
an error or timeout from any of the 'helper' apps. |
|---|
| 102 |
When set to 2, the plugin will always keep the tempdir. Beware that on heavily |
|---|
| 103 |
loaded systems, this might fill your /tmp partition. |
|---|
| 104 |
|
|---|
| 105 |
Util: fuzzy-cleantmp |
|---|
| 106 |
This utility can be used to remove tempdirs left behind if the plugin was |
|---|
| 107 |
configured to save them. It takes one parameter: hours to keep (12 by default) |
|---|
| 108 |
This can safely be placed inside CRON to prune /tmp. |
|---|
| 109 |
|
|---|
| 110 |
Util: gif2anim |
|---|
| 111 |
This utility (from ImageMagic) extracts images from animated gifs as well |
|---|
| 112 |
as giving information regarding delays and image sizes. Requires identify and |
|---|
| 113 |
convert to work (these are required, so not a problem). |
|---|
| 114 |
|
|---|
| 115 |
Fixed: |
|---|
| 116 |
Bug: 'convert' |
|---|
| 117 |
An invalid parameter was specified when using 'convert' to assemble animated gifs |
|---|
| 118 |
resulting in an error message, and the image was not scanned. |
|---|
| 119 |
|
|---|
| 120 |
Bug: 'safe_db' |
|---|
| 121 |
When checking for images in safe_db hash, because we score then as zero (0), |
|---|
| 122 |
we did not 'short circuit' correctly. This has now been fixed. |
|---|
| 123 |
|
|---|
| 124 |
Bug: wrong_ctype |
|---|
| 125 |
There wrong index to the Score hash was used, not allowing the 'focr_wrongctype_score' |
|---|
| 126 |
parameter to take effect. This has now been fixed. |
|---|
| 127 |
|
|---|
| 128 |
Changed: |
|---|
| 129 |
known_image_hash |
|---|
| 130 |
This procedure was called with two parameters: $digest and $score. |
|---|
| 131 |
$digest was not used, so it has been removed. Also, just in the off chance |
|---|
| 132 |
that $score is zero, it uses $Score{base} to score the image. |
|---|
| 133 |
|
|---|
| 134 |
fuzzyocr_check |
|---|
| 135 |
Added code to better determine the name of the attachment. Sometimes, the name |
|---|
| 136 |
is hidden in the 'content-id' header of the image/* MIME part, so we extract |
|---|
| 137 |
it from there if no name is given when this header is available. Also it makes |
|---|
| 138 |
shure that problematic characters are changed so as to not give PERL any more |
|---|
| 139 |
grief. |
|---|
| 140 |
|
|---|
| 141 |
A copy of the original message is now saved in the tempdir created, so that |
|---|
| 142 |
when we instruct the plugin to keep the created tempdir, we have a copy of the |
|---|
| 143 |
original message to further assist in troubleshooting problems. |
|---|
| 144 |
|
|---|
| 145 |
A file is created in tempdir containing all the expanded commands used to |
|---|
| 146 |
process the images. This can help to troubleshoot invalid command errors. |
|---|
| 147 |
|
|---|
| 148 |
Removed some debuglog lines to reduce the lines logged. |
|---|
| 149 |
|
|---|
| 150 |
Uses gif2anim (if available) to extract images from animated gifs. |
|---|
| 151 |
TODO: |
|---|
| 152 |
I will try to the generated anim file to root out animated gif spam where |
|---|
| 153 |
the spam message is not in the largest frame, or is in the frame with the |
|---|
| 154 |
largest delay, as well as other tricks... |
|---|
| 155 |
|
|---|
| 156 |
version 2.3f: |
|---|
| 157 |
Fixed: |
|---|
| 158 |
Properly initialized $h and $w to zero so that when getting the height and width |
|---|
| 159 |
from an image, if the size parameters cannot be parsed, they can get properly tested. |
|---|
| 160 |
|
|---|
| 161 |
Fixed: |
|---|
| 162 |
Hashing now works. $digest was getting reset because it went out of scope. grrr. |
|---|
| 163 |
|
|---|
| 164 |
Fixed: |
|---|
| 165 |
$efile was only being replaced for first occurrence in complex scansets. |
|---|
| 166 |
|
|---|
| 167 |
Fixed: |
|---|
| 168 |
Various bugs where: Use of uninitialized values were reported. |
|---|
| 169 |
|
|---|
| 170 |
version 2.3e: |
|---|
| 171 |
Fixed: |
|---|
| 172 |
Option: 'focr_db_safe' |
|---|
| 173 |
This option was not included in the @pgm_options array.... oops (thanks UxBoD) |
|---|
| 174 |
|
|---|
| 175 |
Score: wrongctype |
|---|
| 176 |
This was not used correctly, thus it was not scoring... (thanks Eric) |
|---|
| 177 |
|
|---|
| 178 |
Changed: |
|---|
| 179 |
It now works with tempfiles only |
|---|
| 180 |
This hopefully reducing the need to read/write image data from memory after each |
|---|
| 181 |
'filter'. This will hopefully reduce IO and memory usage for the plugin. |
|---|
| 182 |
|
|---|
| 183 |
Scanset Syntax: $pfile |
|---|
| 184 |
Because of the use of tempfiles, there is a need to specify the image file to be |
|---|
| 185 |
used as input. '$pfile' must be used to specify the input filename. Please note |
|---|
| 186 |
that in cases where scansets use pipes, only specify $pfile as the input to the |
|---|
| 187 |
first 'filter' program. |
|---|
| 188 |
|
|---|
| 189 |
Scanset Syntax: $efile |
|---|
| 190 |
With every scanset, stderr is redirected to '$efile', which is different for each |
|---|
| 191 |
image. When using multiple filters in a scanset, use '$efile' to redirect stderr |
|---|
| 192 |
to this file, making shure the plugin will correctly recognize an error when it |
|---|
| 193 |
occurs. |
|---|
| 194 |
|
|---|
| 195 |
|
|---|
| 196 |
version 2.3d: |
|---|
| 197 |
Require: |
|---|
| 198 |
Plugin officially requires SA 3.1.4 or higher |
|---|
| 199 |
New Perl Modules |
|---|
| 200 |
DB_File |
|---|
| 201 |
Storable |
|---|
| 202 |
MLDBM |
|---|
| 203 |
Previous |
|---|
| 204 |
String::Approx |
|---|
| 205 |
|
|---|
| 206 |
Removed: |
|---|
| 207 |
Option: 'focr_pre314' |
|---|
| 208 |
Not used as it now requires SA 3.1.4 |
|---|
| 209 |
|
|---|
| 210 |
Added: |
|---|
| 211 |
Option: 'focr_path_bin' |
|---|
| 212 |
Its value is treated as path for searching of @bin_utils, potentially |
|---|
| 213 |
requiring less configuration options; |
|---|
| 214 |
Directories in the path that don't exists, are skipped; |
|---|
| 215 |
Default value: /usr/local/netpbm/bin:/usr/local/bin:/usr/bin |
|---|
| 216 |
|
|---|
| 217 |
Option: 'focr_db_hash' |
|---|
| 218 |
Its value holds the filename to use for storing hash database; See below. |
|---|
| 219 |
Default value: /etc/mail/spamassassin/FuzzyOcr.db |
|---|
| 220 |
|
|---|
| 221 |
Option: 'focr_db_safe' |
|---|
| 222 |
Its value holds the filename to use for storing hash database; See below. |
|---|
| 223 |
Default value: /etc/mail/spamassassin/FuzzyOcr.safe.db |
|---|
| 224 |
|
|---|
| 225 |
Option: 'focr_db_max_days' |
|---|
| 226 |
Its value holds the filename to use for storing hash database; See below. |
|---|
| 227 |
Default value: 35 |
|---|
| 228 |
|
|---|
| 229 |
Option: 'focr_keep_bad_images' |
|---|
| 230 |
If this is set to 1, then this plugin will not remove the temporary image |
|---|
| 231 |
directory created where the images are stored and processed if it |
|---|
| 232 |
determines that the image was corrupt, or an error occurred with any |
|---|
| 233 |
of the auxiliary programs that process the images. Usefull while |
|---|
| 234 |
debugging. |
|---|
| 235 |
Default value: 0 |
|---|
| 236 |
|
|---|
| 237 |
|
|---|
| 238 |
Changed: |
|---|
| 239 |
Option: 'focr_logfile' |
|---|
| 240 |
Defaults to 'stderr' so that logging goes there |
|---|
| 241 |
Option: 'focr_enable_image_hashing' if set to 2: |
|---|
| 242 |
Use MLDBM to store Hash info in true DB file for faster access. |
|---|
| 243 |
Stores hashes of images that exceed set thresholds in file |
|---|
| 244 |
specified by option focr_db_hash |
|---|
| 245 |
Stores hashes of 'clean' images (without matching words) |
|---|
| 246 |
specified by option focr_db_safe to also cache good images. |
|---|
| 247 |
Keeps statistics of Hash-Hits and displays #times matched in log. |
|---|
| 248 |
Saves name of attachment and content/type as reference |
|---|
| 249 |
Automatically imports known-hashes from focr_digest_db into focr_db_hash |
|---|
| 250 |
Automatically expire 'old' records if not matched in more than |
|---|
| 251 |
the number of days specified in option 'focr_db_max_days' |
|---|
| 252 |
Instead of having a 'global' timeout, the 'focr_timeout' is used per |
|---|
| 253 |
external program used, this will ensure that there are no timeouts |
|---|
| 254 |
recorded because of complex scansets, or because of temporary spikes |
|---|
| 255 |
in load. Also, it now displays the name and return code information |
|---|
| 256 |
for the binary that timedout, making it easier to debug problems. |
|---|
| 257 |
|
|---|
| 258 |
Fixed: |
|---|
| 259 |
A bug where option focr_counts_required was not recognized; |
|---|
| 260 |
Logging to file when option 'focr_logfile' set now works; |
|---|
| 261 |
Individual word scores are now applied correctly |
|---|
| 262 |
Storing only images with matched words to hash database (Thanks to Robert LeBlanc) |
|---|
| 263 |
Explicitly use Mail::SpamAssassin::Timeout (Thanks Eric Yiu) |
|---|
| 264 |
Ignores empty lines in wordlists (global and local) |
|---|
| 265 |
Ignores comments starting with (#) to EOL |
|---|
| 266 |
|
|---|
| 267 |
version 2.3c: |
|---|
| 268 |
Require: |
|---|
| 269 |
Plugin officially requires SA 3.1.1 or higher |
|---|
| 270 |
|
|---|
| 271 |
Added: |
|---|
| 272 |
Support for BMP/TIFF Images |
|---|
| 273 |
|
|---|
| 274 |
Changed: |
|---|
| 275 |
Major internal restructuring |
|---|
| 276 |
Use SpamAssassin Logging Facility instead of own logfile |
|---|
| 277 |
|
|---|
| 278 |
Fixed: |
|---|
| 279 |
A bug related to database hashing |
|---|