Show
Ignore:
Timestamp:
10.12.2006 16:30:01 (2 years ago)
Author:
decoder
Message:

Last tweaks, commented out some lines in FuzzyOcr?.cf
Added samples, updated samples README.
Replaced INSTALL and CHANGES files with files pointing to the online version of these files.
It is easier for us to maintain one source of INSTALL/CHANGELOG, otherwise, we'll always get outdated docs.

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • trunk/devel/INSTALL

    r58 r104  
    1 Requirements: 
    2 ~~~~~~~~~~~~~ 
    3   libungif: 
    4     http://sourceforge.net/project/showfiles.php?group_id=102202 
     1The installation manual for the 3.5.x branch is maintained online at: 
    52 
    6   netpbm: 
    7     http://sourceforge.net/project/showfiles.php?group_id=5128 
    8  
    9   gifsicle: 
    10     http://www.lcdf.org/gifsicle/gifsicle-1.44.tar.gz (latest) 
    11  
    12   gocr: (v0.40) suggested ... needs to be patched! 
    13     http://sourceforge.net/project/showfiles.php?group_id=7147 
    14  
    15   ocrad: 
    16     Please use your closest GNU mirror: 
    17       http://www.gnu.org/prep/ftp.html 
    18  
    19   mysql: 
    20     http://www.mysql.com (should work with 3.23+) 
    21  
    22 Perl Packages: 
    23 ~~~~~~~~~~~~~~ 
    24   String::Approx 
    25  
    26   MLDBM             - used with type 2 hasing 
    27   Storable          - used with type 2 hasing 
    28   DB_File           - used with type 2 hasing 
    29  
    30   DBI               - used with type 3 hasing 
    31   DBD::mysql        - used with type 3 hasing 
    32  
    33 Make shure all the above requirements are met, or else!!! 
    34 I personally think it is better to compile all from source, but 
    35 binary packages are available if you decide to go that way. The 
    36 only package that should be compiled from source is gocr (since 
    37 it requires some patching to make it work better ;) 
    38  
    39 Place a copy of the following files in your SpamAssassin local  
    40 configuration directory (/etc/mail/spamassassin by default): 
    41  
    42   FuzzyOcr.pm 
    43   FuzzyOcr.cf (change to taste) 
    44   FuzzyOcr.words 
    45  
    46 Skipping Scans 
    47 ~~~~~~~~~~~~~~ 
    48 Due to possible false positives, you also have the option not to  
    49 scan a particular type of image using the following configuration 
    50 option: 
    51  
    52 focr_skip_<img_type> 1 
    53  
    54 Optionally you could skip scanning of images that are 'too big' by 
    55 specifying the following configuration option: 
    56  
    57 focr_max_size_<img_type> <max-size> 
    58  
    59 where <max-size> is expressed in bytes (compared to the pnm 
    60 filesize), and <img_type> is one of the following: 
    61  
    62 - gif 
    63 - jpeg 
    64 - png 
    65 - bmp 
    66 - tiff 
    67  
    68 Timeouts 
    69 ~~~~~~~~ 
    70 There are two types of timeouts available for FuzzyOCR: 
    71  
    72 1.- Per Application Timeout (Default) 
    73     Set by setting the following: 
    74  
    75     focr_timeout <secs> 
    76     focr_global_timeout 0 (Default) 
    77  
    78     Each external helper application is given <secs> seconds 
    79     to complete, after which time it is assumed that it failed 
    80     and processing continues. 
    81  
    82 2.- Global Timeout 
    83     Set by setting the following: 
    84  
    85     focr_timeout <secs> 
    86     focr_global_timeout 1 
    87  
    88     If scanning takes longer than <secs> seconds, the scan is 
    89     aborted and the images (if any) are not scored or checked. 
    90  
    91 Image Hashing 
    92 ~~~~~~~~~~~~~ 
    93 If using image-hasing option (disabled by default) you need to specify 
    94 the following options in FuzzyOcr.cf: 
    95  
    96 focr_enable_image_hashing 1 
    97 focr_digest_db <full_path_to_file> 
    98  
    99 or 
    100  
    101 focr_enable_image_hashing 2 
    102 focr_db_hash <full_path_to_file> 
    103 focr_db_safe <full_path_to_file> 
    104 focr_db_max_days ##                     (default: 35) 
    105  
    106 In either case, you need to make shure the effective user running  
    107 SpamAssassin has the proper permissions to write to the specified files, 
    108 or change permissions on the files so that the effective user has 
    109 write permissions on these files. 
    110  
    111 Now if you decide to store the data in MySQL tables, 
    112  
    113 focr_enable_image_hashing 3 
    114 focr_db_max_days ##                     (default: 35) 
    115 focr_mysql_db <database_name>           (default: FuzzyOcr) 
    116 focr_mysql_hash <hash_table>            (default: Hash) 
    117 focr_mysql_safe <safe_table>            (default: Safe) 
    118 focr_mysql_user <username>              (default: fuzzyocr) 
    119 focr_mysql_pass <password>              (default: fuzzyocr) 
    120  
    121 and 
    122  
    123  focr_mysql_socket <path_to_socket>     (default: undefined) 
    124 or 
    125  focr_mysql_host <hostname>             (default: localhost) 
    126  focr_mysql_port <mysql_port>           (default: 3306) 
    127  
    128  
    129 Test with: 
    130 ~~~~~~~~~~ 
    131   spamassassin --debug FuzzyOcr < path_to_email > /dev/null 
    132  
    133 If you do not get errors, you are ready to go, and restart SPAMD which is 
    134 the (*strongly*) recomended way to use this plugin. 
    135  
     3http://fuzzyocr.own-hero.net/wiki/Installation-3.5.x