Changelog for 3.x branch

Quick links to branches:

version 3.4.0

  • Initial development release from SVN (based on 2.3j by Jorge Valdes)
  • Majorly refactored (see http://www.joval.info/proj/FuzzyOcr-2.3j/CHANGES for a Changelog 2.3b -> 2.3j)
  • Improved support for animated gifs (requires new dependency gifsicle)
  • Removed ImageMagick dependency

version 3.4.1

  • Fixed logging facility
    • Now logs to file only if specified
    • SA output does not go to logfile anymore
    • Running SA in debug mode always outputs FuzzyOcr debug messages
  • Some documentation parts and configuration file updated

version 3.4.2

  • Fixed Configuration Facility to work properly together with other custom plugins (like RelayChecker)
    • Thanks to John Rudd for reporting this
  • fuzzy-find.pl utility fixed
    • now outputs usage description
    • uses FuzzyOcr.cf for file paths
    • removed ImageMagick dependency

version 3.5.0

  • First release which is completely modularized for better maintaining
  • Completely rewrote handling of external applications
    • Process timeouts are enforced now
    • No zombie processes
    • Decide between global or per-application timeout
    • Flexible way to add helper additional applications in the config
  • Completely new scanset and preprocessor interface
    • Scansets and preprocessors are now in two seperate files
    • New, easy syntax for both
    • Flexible enough to allow almost all applications without using scripts
    • Easily put preprocessors together to one pipe for a scanset
    • Plugin emulates STDIN and STDOUT automatically if necessary
    • Plugin handles all input and output files as well as pipes automatically
    • Plugin substitutes automatically all $macros with registered helper applications
    • Allows the use of TesserAct and practically every other command line OCR engine
    • Zombie save system, each external application is terminated properly
  • Hash database system extended
    • Experimental MySQL interface for the hashing system
    • MLDBM databases are now properly locked to prevent corruption and failure
      • Requires the new dependency MLDBM::Sync as listed in the installation manual
      • Requires Tie::Cache (MLDBM::Sync dependency)
  • More resource saving features
    • Option to skip files based on
      • File size (defined per format)
      • File type
      • Image dimensions
    • Negative auto disable value (Messages below a given score are skipped)
    • Minimal scanset option (The first scanset producing enough hits is taken, others are skipped)
    • Automatic scanset resorting option
      • When running in memory, plugin keeps track of the efficiency for each scanset on the last X messages
      • After each scan, the scanset order is changed to have the most efficient scanset as first scanset.
    • Autodisable score is now rechecked between initial FuzzyOcr? tests (content-type, etc) and OCR tests
  • More features against false positives
    • Auto threshold adjusting for smaller words
    • OCR Results are analyzed twice
      • First pass without stripping spaces, hits scored here weigth more than second pass hits
      • Second pass with stripping spaces, only done if first pass does not hit enough
    • New option to allow each word to match only once per image
  • Improved tools
    • fuzzy-find majorly improved
      • Works together with configuration file now
      • Added switches to learn spam or ham from command line as hashes or image files
      • MySQL support
      • Some bugs fixed
  • Better extraction of images from the message
    • Content-Type Application/Octet-Stream is accepted now for all images
    • New rule which scores if file extension and file format don't match
    • Attachments with Application/Octet-Stream are now always checked for magic bytes
  • Better logging
    • Running spamassassin with -D will always output all debug messages to stderr
    • New logfile log levels (0-3)
    • Message ID, sender and receipient are now logged in debug mode
    • Debug mode logging shows execution time for external applications now to find bottlenecks

  • Bugfixes and minor changes
    • Configuration parser rewritten, accepts 0 now (instead of 0.0)
    • Zombie bugs fixed
    • Fixed bug in locking of logfile and plain hash database
    • Temporary directories now cleaned up on global timeout and if multiple images were in the message
    • Only create temporary directory if an image was found (speeds up processing)
    • Skip Ocrad scansets with images smaller than 16x16 (produces error with ocrad)
    • New option to allow matching of numbers in the OCR output
    • Personal wordlist option now uses userstate by default to allow automatic substitution with user's homedir by SA

version 3.5.1

  • Several bugfixes, see the Patchset list for 3.5.0 for a detailed list
  • Added maximum height/width for images to scan
  • Bugfix in kill_pid()