Changelog for 3.x branch
Quick links to branches:
version 3.4.0
- Initial development release from SVN (based on 2.3j by Jorge Valdes)
- Majorly refactored (see http://www.joval.info/proj/FuzzyOcr-2.3j/CHANGES for a Changelog 2.3b -> 2.3j)
- Improved support for animated gifs (requires new dependency gifsicle)
- Removed ImageMagick dependency
version 3.4.1
- Fixed logging facility
- Now logs to file only if specified
- SA output does not go to logfile anymore
- Running SA in debug mode always outputs FuzzyOcr debug messages
- Some documentation parts and configuration file updated
version 3.4.2
- Fixed Configuration Facility to work properly together with other custom plugins (like RelayChecker)
- Thanks to John Rudd for reporting this
- fuzzy-find.pl utility fixed
- now outputs usage description
- uses FuzzyOcr.cf for file paths
- removed ImageMagick dependency
version 3.5.0
- First release which is completely modularized for better maintaining
- Completely rewrote handling of external applications
- Process timeouts are enforced now
- No zombie processes
- Decide between global or per-application timeout
- Flexible way to add helper additional applications in the config
- Completely new scanset and preprocessor interface
- Scansets and preprocessors are now in two seperate files
- New, easy syntax for both
- Flexible enough to allow almost all applications without using scripts
- Easily put preprocessors together to one pipe for a scanset
- Plugin emulates STDIN and STDOUT automatically if necessary
- Plugin handles all input and output files as well as pipes automatically
- Plugin substitutes automatically all $macros with registered helper applications
- Allows the use of TesserAct and practically every other command line OCR engine
- Zombie save system, each external application is terminated properly
- Hash database system extended
- Experimental MySQL interface for the hashing system
- MLDBM databases are now properly locked to prevent corruption and failure
- Requires the new dependency MLDBM::Sync as listed in the installation manual
- Requires Tie::Cache (MLDBM::Sync dependency)
- More resource saving features
- Option to skip files based on
- File size (defined per format)
- File type
- Image dimensions
- Negative auto disable value (Messages below a given score are skipped)
- Minimal scanset option (The first scanset producing enough hits is taken, others are skipped)
- Automatic scanset resorting option
- When running in memory, plugin keeps track of the efficiency for each scanset on the last X messages
- After each scan, the scanset order is changed to have the most efficient scanset as first scanset.
- Autodisable score is now rechecked between initial FuzzyOcr? tests (content-type, etc) and OCR tests
- Option to skip files based on
- More features against false positives
- Auto threshold adjusting for smaller words
- OCR Results are analyzed twice
- First pass without stripping spaces, hits scored here weigth more than second pass hits
- Second pass with stripping spaces, only done if first pass does not hit enough
- New option to allow each word to match only once per image
- Improved tools
- fuzzy-find majorly improved
- Works together with configuration file now
- Added switches to learn spam or ham from command line as hashes or image files
- MySQL support
- Some bugs fixed
- fuzzy-find majorly improved
- Better extraction of images from the message
- Content-Type Application/Octet-Stream is accepted now for all images
- New rule which scores if file extension and file format don't match
- Attachments with Application/Octet-Stream are now always checked for magic bytes
- Better logging
- Running spamassassin with -D will always output all debug messages to stderr
- New logfile log levels (0-3)
- Message ID, sender and receipient are now logged in debug mode
- Debug mode logging shows execution time for external applications now to find bottlenecks
- Bugfixes and minor changes
- Configuration parser rewritten, accepts 0 now (instead of 0.0)
- Zombie bugs fixed
- Fixed bug in locking of logfile and plain hash database
- Temporary directories now cleaned up on global timeout and if multiple images were in the message
- Only create temporary directory if an image was found (speeds up processing)
- Skip Ocrad scansets with images smaller than 16x16 (produces error with ocrad)
- New option to allow matching of numbers in the OCR output
- Personal wordlist option now uses userstate by default to allow automatic substitution with user's homedir by SA
version 3.5.1
- Several bugfixes, see the Patchset list for 3.5.0 for a detailed list
- Added maximum height/width for images to scan
- Bugfix in kill_pid()
