root/trunk/devel/samples/README

Revision 104, 3.5 kB (checked in by decoder, 2 years ago)

Last tweaks, commented out some lines in FuzzyOcr?.cf
Added samples, updated samples README.
Replaced INSTALL and CHANGES files with files pointing to the online version of these files.
It is easier for us to maintain one source of INSTALL/CHANGELOG, otherwise, we'll always get outdated docs.

Line 
1 These eml files are sample spam emails to test your installation of FuzzyOCR.
2
3 Use spamassassin -t < samplefile.eml to test :)
4
5 ATTENTION: If FuzzyOcr does not trigger on one of the messages, then make sure you have the focr_autodisable_score set high enough.
6 Otherwise, if a message gets enough hits by SA, FuzzyOcr will not scan it. This is generally depending on your other SA rules.
7
8
9 ocr-gif.eml: Contains a corrupted gif image, additionally I changed the content-type to jpeg, so the output should show:
10
11  1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
12                             content-type set
13                             Image has format "GIF" but content-type is
14                             "image/jpeg"
15  2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
16                             Corrupt image: GIF-LIB error: Image is
17                             defective, decoding aborted.
18  8.8 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
19                             Words found:
20                             "target" in 1 lines
21                             "service" in 1 lines
22                             "stock" in 2 lines
23                             "price" in 2 lines
24                             "company" in 1 lines
25                             "recommendation" in 1 lines
26                             (12 word occurrences found)
27
28 ocr-animated.eml: Contains an animated gif. If all deanimation routines are working properly on your system, the output should contain:
29
30  6.5 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
31                             Words found:
32                             "price" in 1 lines
33                             "company" in 1 lines
34                             "alert" in 1 lines
35                             "news" in 1 lines
36
37 ocr-obfuscated.eml: Contains an obfuscated gif image, to test the ocrad-decolorize scansets. If you want to test this scanset, either set the minimal_scanset option to 0 or put the decolorize scanset temporarily at the beginning of the scansets file. The output should be:
38
39  5.9 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
40                             Words found:
41                             "target" in 1 lines
42                             "profit" in 1 lines
43                             "trade" in 1 lines
44                             (4.5 word occurrences found)
45
46
47 ocr-jpg.eml: Contains a jpeg file. Output should show:
48
49  5.9 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
50                             Words found:
51                             "levitra" in 1 lines
52                             "viagra" in 2 lines
53                             (4.5 word occurrences found)
54
55
56 ocr-png.eml: Contains a png file. Output should show:
57
58   14 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
59                             Words found:
60                             "buy" in 1 lines
61                             "target" in 2 lines
62                             "service" in 1 lines
63                             "stock" in 1 lines
64                             "investor" in 1 lines
65                             "price" in 3 lines
66                             "company" in 2 lines
67                             "trade" in 1 lines
68                             "software" in 1 lines
69                             "recommendation" in 1 lines
70                             "news" in 3 lines
71                             (25.5 word occurrences found)
Note: See TracBrowser for help on using the browser.