| 8 | | FuzzyOcr.cf |
|---|
| 9 | | Fixed outstanding errors. Variable mismatches are now fixed. |
|---|
| 10 | | |
|---|
| 11 | | FuzzyOcr.pm |
|---|
| 12 | | Trap ImageMagick errors better, and logs them. |
|---|
| 13 | | |
|---|
| 14 | | When processing Animated-GIF files, due to the algorithm, it is possible |
|---|
| 15 | | to discard all frames, leaving an empty image. Now, this special case |
|---|
| 16 | | is treated as a corrupt image, and triggers FUZZY_OCR_CORRUPT_IMG with |
|---|
| 17 | | $Score{corrupt} points (2.5 by default). |
|---|
| 18 | | |
|---|
| 19 | | Changed: |
|---|
| 20 | | Option: focr_personal_wordlist |
|---|
| 21 | | Now, if the option value begins with '/', the value is not treated as |
|---|
| 22 | | relative to the efective user's HOME directory, but as a fixed path. |
|---|
| 23 | | |
|---|
| 24 | | |
|---|
| 25 | | version 2.3i: |
|---|
| 26 | | Added: |
|---|
| 27 | | Option: 'focr_score_ham' Default: 0.0 |
|---|
| 28 | | When set to 1, images that are below the 'focr_counts_required' threshold, |
|---|
| 29 | | are scored with the formula: $Score{Add} * $cnt; this gives marginally bad |
|---|
| 30 | | images some positive score instead of just allowing them without score. |
|---|
| 31 | | |
|---|
| 32 | | Removed: |
|---|
| 33 | | Util: gif2anim |
|---|
| 34 | | This script is no longer used in the plugin, so it is removed from the |
|---|
| 35 | | distribution, although if needed, it may be found in the previous version. |
|---|
| 36 | | |
|---|
| 37 | | Fixed: |
|---|
| 38 | | The plugin was stuck in infinite loop in the case where there is more |
|---|
| 39 | | than one attachment with the same name. The tie-breaking was not working. |
|---|
| 40 | | |
|---|
| 41 | | When processing GIF files, extra care has to be taken so that ImageMagick |
|---|
| 42 | | properly recognizes the files as GIF images, otherwise, an error occurs |
|---|
| 43 | | because ImageMagick cannot properly determine the image 'type' and cannot |
|---|
| 44 | | determine the image size, resulting in an invalid hash. Code is now in place |
|---|
| 45 | | to prevent this, and in the case where invalid image size is encountered, |
|---|
| 46 | | the processing of this image is skipped. |
|---|
| 47 | | |
|---|
| 48 | | Changed: |
|---|
| 49 | | When the plugin determines that words from the lists are found in the images, |
|---|
| 50 | | it now stores these words in 'focr_db_hash' so that when we encounter the same |
|---|
| 51 | | image hash in another message, the report will add the words 'found' to the |
|---|
| 52 | | report, giving the end user more information, instead of just the |
|---|
| 53 | | FOCR_KNOWN_IMAGE_HASH rule firing with the previous score. |
|---|
| 54 | | |
|---|
| 55 | | version 2.3h: |
|---|
| 56 | | Require: |
|---|
| 57 | | New Perl Module |
|---|
| 58 | | Image::Magick; |
|---|
| 59 | | Added: |
|---|
| 60 | | Option: 'focr_anim_delay' Default: 100 |
|---|
| 61 | | This option is used with animated GIF files, and keeps all images |
|---|
| 62 | | that are displayed for at least 1 sec. |
|---|
| 63 | | |
|---|
| 64 | | Option: 'focr_anim_max_frames' Default: 2 |
|---|
| 65 | | This option is used with animated GIF files, and keeps top N |
|---|
| 66 | | largest frames. |
|---|
| 67 | | |
|---|
| 68 | | Fixed: |
|---|
| 69 | | Option: 'focr_digest_hash' |
|---|
| 70 | | Fixed internal parameter to reflect option from original plugin (Thanks Bill). |
|---|
| 71 | | |
|---|
| 72 | | Option: 'focr_db_hash' |
|---|
| 73 | | Updated FuzzyOcr.cf to reflect plugin option. |
|---|
| 74 | | |
|---|
| 75 | | Option: 'focr_db_safe' |
|---|
| 76 | | Updated FuzzyOcr.cf to reflect plugin option. |
|---|
| 77 | | |
|---|
| 78 | | Option: 'focr_counts_required' |
|---|
| 79 | | Fixed default value of '2' was set to '5' making it behave as the original plugin. |
|---|
| 80 | | |
|---|
| 81 | | Removed: |
|---|
| 82 | | Option: 'focr_bin_identify' |
|---|
| 83 | | Option: 'focr_bin_convert' |
|---|
| 84 | | These options are no longer valid, since the external programs are no longer called |
|---|
| 85 | | in favor of using PERL module. Makes things 'simpler'. |
|---|
| 86 | | |
|---|
| 87 | | Option: 'focr_bin_gifasm' |
|---|
| 88 | | Option: 'focr_bin_tifftopnm' |
|---|
| 89 | | external program not used anymore. |
|---|
| 90 | | |
|---|
| 91 | | Changed: |
|---|
| 92 | | The plugin now uses Image::Magick module to access ImageMagick functions from PERL instead |
|---|
| 93 | | of accessing external programs. This makes for fewer system calls to run external programs. |
|---|
| 94 | | (Idea from Eric Yiu) |
|---|
| 95 | | |
|---|
| 96 | | version 2.3g: |
|---|
| 97 | | Added: |
|---|
| 98 | | Option: focr_keep_bad_images |
|---|
| 99 | | The default value for this option is zero(0). |
|---|
| 100 | | When set to 1, the plugin will not remove a tempdir whenever it registers |
|---|
| 101 | | an error or timeout from any of the 'helper' apps. |
|---|
| 102 | | When set to 2, the plugin will always keep the tempdir. Beware that on heavily |
|---|
| 103 | | loaded systems, this might fill your /tmp partition. |
|---|
| 104 | | |
|---|
| 105 | | Util: fuzzy-cleantmp |
|---|
| 106 | | This utility can be used to remove tempdirs left behind if the plugin was |
|---|
| 107 | | configured to save them. It takes one parameter: hours to keep (12 by default) |
|---|
| 108 | | This can safely be placed inside CRON to prune /tmp. |
|---|
| 109 | | |
|---|
| 110 | | Util: gif2anim |
|---|
| 111 | | This utility (from ImageMagic) extracts images from animated gifs as well |
|---|
| 112 | | as giving information regarding delays and image sizes. Requires identify and |
|---|
| 113 | | convert to work (these are required, so not a problem). |
|---|
| 114 | | |
|---|
| 115 | | Fixed: |
|---|
| 116 | | Bug: 'convert' |
|---|
| 117 | | An invalid parameter was specified when using 'convert' to assemble animated gifs |
|---|
| 118 | | resulting in an error message, and the image was not scanned. |
|---|
| 119 | | |
|---|
| 120 | | Bug: 'safe_db' |
|---|
| 121 | | When checking for images in safe_db hash, because we score then as zero (0), |
|---|
| 122 | | we did not 'short circuit' correctly. This has now been fixed. |
|---|
| 123 | | |
|---|
| 124 | | Bug: wrong_ctype |
|---|
| 125 | | There wrong index to the Score hash was used, not allowing the 'focr_wrongctype_score' |
|---|
| 126 | | parameter to take effect. This has now been fixed. |
|---|
| 127 | | |
|---|
| 128 | | Changed: |
|---|
| 129 | | known_image_hash |
|---|
| 130 | | This procedure was called with two parameters: $digest and $score. |
|---|
| 131 | | $digest was not used, so it has been removed. Also, just in the off chance |
|---|
| 132 | | that $score is zero, it uses $Score{base} to score the image. |
|---|
| 133 | | |
|---|
| 134 | | fuzzyocr_check |
|---|
| 135 | | Added code to better determine the name of the attachment. Sometimes, the name |
|---|
| 136 | | is hidden in the 'content-id' header of the image/* MIME part, so we extract |
|---|
| 137 | | it from there if no name is given when this header is available. Also it makes |
|---|
| 138 | | shure that problematic characters are changed so as to not give PERL any more |
|---|
| 139 | | grief. |
|---|
| 140 | | |
|---|
| 141 | | A copy of the original message is now saved in the tempdir created, so that |
|---|
| 142 | | when we instruct the plugin to keep the created tempdir, we have a copy of the |
|---|
| 143 | | original message to further assist in troubleshooting problems. |
|---|
| 144 | | |
|---|
| 145 | | A file is created in tempdir containing all the expanded commands used to |
|---|
| 146 | | process the images. This can help to troubleshoot invalid command errors. |
|---|
| 147 | | |
|---|
| 148 | | Removed some debuglog lines to reduce the lines logged. |
|---|
| 149 | | |
|---|
| 150 | | Uses gif2anim (if available) to extract images from animated gifs. |
|---|
| 151 | | TODO: |
|---|
| 152 | | I will try to the generated anim file to root out animated gif spam where |
|---|
| 153 | | the spam message is not in the largest frame, or is in the frame with the |
|---|
| 154 | | largest delay, as well as other tricks... |
|---|
| 155 | | |
|---|
| 156 | | version 2.3f: |
|---|
| 157 | | Fixed: |
|---|
| 158 | | Properly initialized $h and $w to zero so that when getting the height and width |
|---|
| 159 | | from an image, if the size parameters cannot be parsed, they can get properly tested. |
|---|
| 160 | | |
|---|
| 161 | | Fixed: |
|---|
| 162 | | Hashing now works. $digest was getting reset because it went out of scope. grrr. |
|---|
| 163 | | |
|---|
| 164 | | Fixed: |
|---|
| 165 | | $efile was only being replaced for first occurrence in complex scansets. |
|---|
| 166 | | |
|---|
| 167 | | Fixed: |
|---|
| 168 | | Various bugs where: Use of uninitialized values were reported. |
|---|
| 169 | | |
|---|
| 170 | | version 2.3e: |
|---|
| 171 | | Fixed: |
|---|
| 172 | | Option: 'focr_db_safe' |
|---|
| 173 | | This option was not included in the @pgm_options array.... oops (thanks UxBoD) |
|---|
| 174 | | |
|---|
| 175 | | Score: wrongctype |
|---|
| 176 | | This was not used correctly, thus it was not scoring... (thanks Eric) |
|---|
| 177 | | |
|---|
| 178 | | Changed: |
|---|
| 179 | | It now works with tempfiles only |
|---|
| 180 | | This hopefully reducing the need to read/write image data from memory after each |
|---|
| 181 | | 'filter'. This will hopefully reduce IO and memory usage for the plugin. |
|---|
| 182 | | |
|---|
| 183 | | Scanset Syntax: $pfile |
|---|
| 184 | | Because of the use of tempfiles, there is a need to specify the image file to be |
|---|
| 185 | | used as input. '$pfile' must be used to specify the input filename. Please note |
|---|
| 186 | | that in cases where scansets use pipes, only specify $pfile as the input to the |
|---|
| 187 | | first 'filter' program. |
|---|
| 188 | | |
|---|
| 189 | | Scanset Syntax: $efile |
|---|
| 190 | | With every scanset, stderr is redirected to '$efile', which is different for each |
|---|
| 191 | | image. When using multiple filters in a scanset, use '$efile' to redirect stderr |
|---|
| 192 | | to this file, making shure the plugin will correctly recognize an error when it |
|---|
| 193 | | occurs. |
|---|
| 194 | | |
|---|
| 195 | | |
|---|
| 196 | | version 2.3d: |
|---|
| 197 | | Require: |
|---|
| 198 | | Plugin officially requires SA 3.1.4 or higher |
|---|
| 199 | | New Perl Modules |
|---|
| 200 | | DB_File |
|---|
| 201 | | Storable |
|---|
| 202 | | MLDBM |
|---|
| 203 | | Previous |
|---|
| 204 | | String::Approx |
|---|
| 205 | | |
|---|
| 206 | | Removed: |
|---|
| 207 | | Option: 'focr_pre314' |
|---|
| 208 | | Not used as it now requires SA 3.1.4 |
|---|
| 209 | | |
|---|
| 210 | | Added: |
|---|
| 211 | | Option: 'focr_path_bin' |
|---|
| 212 | | Its value is treated as path for searching of @bin_utils, potentially |
|---|
| 213 | | requiring less configuration options; |
|---|
| 214 | | Directories in the path that don't exists, are skipped; |
|---|
| 215 | | Default value: /usr/local/netpbm/bin:/usr/local/bin:/usr/bin |
|---|
| 216 | | |
|---|
| 217 | | Option: 'focr_db_hash' |
|---|
| 218 | | Its value holds the filename to use for storing hash database; See below. |
|---|
| 219 | | Default value: /etc/mail/spamassassin/FuzzyOcr.db |
|---|
| 220 | | |
|---|
| 221 | | Option: 'focr_db_safe' |
|---|
| 222 | | Its value holds the filename to use for storing hash database; See below. |
|---|
| 223 | | Default value: /etc/mail/spamassassin/FuzzyOcr.safe.db |
|---|
| 224 | | |
|---|
| 225 | | Option: 'focr_db_max_days' |
|---|
| 226 | | Its value holds the filename to use for storing hash database; See below. |
|---|
| 227 | | Default value: 35 |
|---|
| 228 | | |
|---|
| 229 | | Option: 'focr_keep_bad_images' |
|---|
| 230 | | If this is set to 1, then this plugin will not remove the temporary image |
|---|
| 231 | | directory created where the images are stored and processed if it |
|---|
| 232 | | determines that the image was corrupt, or an error occurred with any |
|---|
| 233 | | of the auxiliary programs that process the images. Usefull while |
|---|
| 234 | | debugging. |
|---|
| 235 | | Default value: 0 |
|---|
| 236 | | |
|---|
| 237 | | |
|---|
| 238 | | Changed: |
|---|
| 239 | | Option: 'focr_logfile' |
|---|
| 240 | | Defaults to 'stderr' so that logging goes there |
|---|
| 241 | | Option: 'focr_enable_image_hashing' if set to 2: |
|---|
| 242 | | Use MLDBM to store Hash info in true DB file for faster access. |
|---|
| 243 | | Stores hashes of images that exceed set thresholds in file |
|---|
| 244 | | specified by option focr_db_hash |
|---|
| 245 | | Stores hashes of 'clean' images (without matching words) |
|---|
| 246 | | specified by option focr_db_safe to also cache good images. |
|---|
| 247 | | Keeps statistics of Hash-Hits and displays #times matched in log. |
|---|
| 248 | | Saves name of attachment and content/type as reference |
|---|
| 249 | | Automatically imports known-hashes from focr_digest_db into focr_db_hash |
|---|
| 250 | | Automatically expire 'old' records if not matched in more than |
|---|
| 251 | | the number of days specified in option 'focr_db_max_days' |
|---|
| 252 | | Instead of having a 'global' timeout, the 'focr_timeout' is used per |
|---|
| 253 | | external program used, this will ensure that there are no timeouts |
|---|
| 254 | | recorded because of complex scansets, or because of temporary spikes |
|---|
| 255 | | in load. Also, it now displays the name and return code information |
|---|
| 256 | | for the binary that timedout, making it easier to debug problems. |
|---|
| 257 | | |
|---|
| 258 | | Fixed: |
|---|
| 259 | | A bug where option focr_counts_required was not recognized; |
|---|
| 260 | | Logging to file when option 'focr_logfile' set now works; |
|---|
| 261 | | Individual word scores are now applied correctly |
|---|
| 262 | | Storing only images with matched words to hash database (Thanks to Robert LeBlanc) |
|---|
| 263 | | Explicitly use Mail::SpamAssassin::Timeout (Thanks Eric Yiu) |
|---|
| 264 | | Ignores empty lines in wordlists (global and local) |
|---|
| 265 | | Ignores comments starting with (#) to EOL |
|---|
| 266 | | |
|---|
| 267 | | version 2.3c: |
|---|
| 268 | | Require: |
|---|
| 269 | | Plugin officially requires SA 3.1.1 or higher |
|---|
| 270 | | |
|---|
| 271 | | Added: |
|---|
| 272 | | Support for BMP/TIFF Images |
|---|
| 273 | | |
|---|
| 274 | | Changed: |
|---|
| 275 | | Major internal restructuring |
|---|
| 276 | | Use SpamAssassin Logging Facility instead of own logfile |
|---|
| 277 | | |
|---|
| 278 | | Fixed: |
|---|
| 279 | | A bug related to database hashing |
|---|
| | 3 | http://fuzzyocr.own-hero.net/wiki/Changelog-3.x#version3.5.0 |
|---|