root/tags/FuzzyOcr-3.5.0-rc1/samples/README

Revision 105, 3.5 kB (checked in by decoder, 2 years ago)

Added FuzzyOcr? 3.5.0-rc1 tag

Line 
1 These eml files are sample spam emails to test your installation of FuzzyOCR.
2
3 Use spamassassin -t < samplefile.eml to test :)
4
5 ATTENTION: If FuzzyOcr does not trigger on one of the messages, then make sure you have the focr_autodisable_score set high enough.
6 Otherwise, if a message gets enough hits by SA, FuzzyOcr will not scan it. This is generally depending on your other SA rules.
7
8
9 ocr-gif.eml: Contains a corrupted gif image, additionally I changed the content-type to jpeg, so the output should show:
10
11  1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
12                             content-type set
13                             Image has format "GIF" but content-type is
14                             "image/jpeg"
15  2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
16                             Corrupt image: GIF-LIB error: Image is
17                             defective, decoding aborted.
18  8.8 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
19                             Words found:
20                             "target" in 1 lines
21                             "service" in 1 lines
22                             "stock" in 2 lines
23                             "price" in 2 lines
24                             "company" in 1 lines
25                             "recommendation" in 1 lines
26                             (12 word occurrences found)
27
28 ocr-animated.eml: Contains an animated gif. If all deanimation routines are working properly on your system, the output should contain:
29
30  6.5 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
31                             Words found:
32                             "price" in 1 lines
33                             "company" in 1 lines
34                             "alert" in 1 lines
35                             "news" in 1 lines
36
37 ocr-obfuscated.eml: Contains an obfuscated gif image, to test the ocrad-decolorize scansets. If you want to test this scanset, either set the minimal_scanset option to 0 or put the decolorize scanset temporarily at the beginning of the scansets file. The output should be:
38
39  5.9 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
40                             Words found:
41                             "target" in 1 lines
42                             "profit" in 1 lines
43                             "trade" in 1 lines
44                             (4.5 word occurrences found)
45
46
47 ocr-jpg.eml: Contains a jpeg file. Output should show:
48
49  5.9 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
50                             Words found:
51                             "levitra" in 1 lines
52                             "viagra" in 2 lines
53                             (4.5 word occurrences found)
54
55
56 ocr-png.eml: Contains a png file. Output should show:
57
58   14 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
59                             Words found:
60                             "buy" in 1 lines
61                             "target" in 2 lines
62                             "service" in 1 lines
63                             "stock" in 1 lines
64                             "investor" in 1 lines
65                             "price" in 3 lines
66                             "company" in 2 lines
67                             "trade" in 1 lines
68                             "software" in 1 lines
69                             "recommendation" in 1 lines
70                             "news" in 3 lines
71                             (25.5 word occurrences found)
Note: See TracBrowser for help on using the browser.