Ticket #7 (closed enhancement: fixed)

Opened 2 years ago

Last modified 1 year ago

matching/scoring enhancement

Reported by: dietmar.rieder@tugraz.at Assigned to: decoder
Priority: minor Milestone: Development Release Version 3.5
Component: Don't know Version: SVN
Keywords: Cc:

Description

Before searching of matching words, you are removing everything but characters from the input line by using "s/[a-zA-Z]//g;" This means that also spaces between words are lost, which might cause false positives, but it might also help to detect obfucscated word. However wouldn't it make sens to match also for words before removing spaces and give those hits a higher score?

Attachments

Change History

17.11.2006 11:54:20 changed by decoder

  • status changed from new to assigned.

That is definetly a good idea and isn't cost intensive either... I think we'll implement this in the upcoming 3.5 branch

Thanks :)

02.12.2006 13:35:26 changed by decoder

  • version changed from 3.4 to SVN.
  • milestone changed from Development Release Version 3.4 to Development Release Version 3.5.

03.12.2006 01:47:08 changed by decoder

  • status changed from assigned to closed.
  • resolution set to fixed.

Implemented in the current SVN revision. Two matching passes are done now, the first without, and the second with spaces stripped. If the first pass raises enough hits, then the second is skipped. The amount of hits from the first pass is multiplied with the focr_twopass_scoring_factor constant to "weight" more than the hits that one would get with spaces stripped. Therefore, hits from the first pass cause higher scores.

Chris

03.08.2007 10:23:31 changed by anonymous

03.08.2007 12:23:34 changed by anonymous

15.08.2007 02:25:50 changed by anonymous

17.08.2007 18:41:41 changed by anonymous


Add/Change #7 (matching/scoring enhancement)




Action