Ticket #34 (assigned enhancement)

Opened 3 years ago

Last modified 5 hours ago

PostgreSQL support

Reported by: juan@… Owned by: decoder
Priority: trivial Milestone: Development Release Version 3.5
Component: Hash Database Version: SVN
Keywords: Cc:

Description

I realize MySQL is still experimental. Is PostgreSQL support scheduled any time soon? I would be more than happy to test out any PostgreSQL implementations.

Attachments

fuzzyocr_pgsql.patch.tar.bz2 Download (21.6 KB) - added by Juan 3 years ago.
FuzzyOcr? w/ PostgreSQL support (patched files)
FuzzyOcr-3.5.0-rc1_PgSQL-patches.tar.gz Download (27.7 KB) - added by Juan 3 years ago.
newest PgSQL patchset

Change History

in reply to: ↑ description   Changed 3 years ago by juan@…

I currently have FuzzyOcr? using PostgreSQL 8.1.6. My hack is nothing more than changing a couple of lines in Config.pm:

Line 38 from: use constant HAS_DBD_MYSQL => eval { require DBD::mysql; };

TO

use constant HAS_DBD_MYSQL => eval { require DBD::pg };

Line 164 from: my $dsn = "dbi:mysql:database=".$conf->{focr_mysql_db};

TO

my $dsn = "dbi:pg:database=".$conf->{focr_mysql_db};

Last but not least, DBD-Pg is required from cpan for this to function. I am using DBD-Pg 1.43.

So, as you can see, it's a sure hack but it does work with PostgreSQL. The reason I didn't create a subroutine is because I was unsure of how to get something like this:

use constant HAS_DBD_PG => eval { require DBD::pg };

... and because I am not proficient in perl. =)

I would like to help out if I can so if you have any questions, please contact me at the address I posted with.

  Changed 3 years ago by juan@…

I had to make some changes to Hashing.pm and Config.pm...

I am pretty sure that even MySQL support doesn't work the way it is currently coded because it called the wrong db by calling $db.$table, $db.$dbfile and $dbfile.* which doesn't work.

To get it to work:

In Hashing.pm:

All instances of $db.$table must be simply $table All instances of $db.$dbfile must be simply $dbfile All instances of $dbfile.* must be simply * (ex: $dbfile.key='$key' should be key='$key')

In Config.pm:

All instances of $db.$tab must be simply $tab All instances of $tab.* must be simply * (ex: $tab.key='$key' should be key='$key')

Lastly, I need to apply a correction to a previous post: The corrected syntax for initiating PostgreSQL is: my $dsn = "dbi:Pg:database=".$conf->{focr_mysql_db};

AND NOT

my $dsn = "dbi:pg:database=".$conf->{focr_mysql_db};

After these changes, hashing is stored in PgSQL and incoming images are compared to previous hashes in PgSQL....

Great piece of work dude!!!!

  Changed 3 years ago by decoder

  • status changed from new to assigned

well it isnt scheduled yet (we are trying to bring out this version as stable first), but it shouldn't be a big deal. To my knowledge, Mysql and Postgresql aren't that different, and since we only use basic features of mysql, it should be easy to modify the code to use postgresql instead. I'll look at the code tomorrow and maybe I can implement this with a patch then :)

Best regards,

Chris

  Changed 3 years ago by juan@…

Howdy ho,

Well, I no longer wanted to use my production server to test Fuzzy so I set up a test box on my laptop to start playing with the code.

This is what I concluded as being the easiest but I could be wrong.

1. Change all focr_mysql_* settings in the cf file to focr_sql_* for generic purposes 2. Add focr_sql_type. Options mysql/pgsql

I currently modifying the code in 4 files but haven't gotten to the point to test yet.

I will continue with this patch unless you think I should halt. If anything, my perl skills are improving... =)

Thanks for your quick reply!

  Changed 3 years ago by juan@…

This post is just FYI's.....

Let me know if I can help out in any way... I know how it goes when you really want to get a project to stable and then some dork, like me, happens upon your project and makes a silly, yet probably easy, request. But seriously, no rush dude. My production server is working just fine with PgSQL with my hacked version of 3.5_rc1 + patches.

I am using ocrad and tesseract and no gocr.

I and some others are working on the Gentoo ebuild. So hopefully, when Fuzzy 3.5 goes stable, the ebuild won't be far behind.

I have a postgre schema that you can add with the source.

Juan

  Changed 3 years ago by juan@…

Hello again,

I was bored, so I added PostgreSQL support.

I have archived the 4 relevant files, the patch and the postgresql dump for your review. It works quite well as far as I can tell (with the execption of that one bug mentioned above)...

Here is the PgSQL patch file (6 files total):  http://www.xdracco.net/~dracco/fuzzyocr_pgsql.patch.tar.bz2

I'm a little weary of the patch file (i've never patched in directory format, only single file). So before you enter code|mode, check out what I've done to see if it can be used.

PS: I can't verify that it works with MySQL as I do not use it but it should be seemless as I didn't modify any query statements in and of themselves, with the exception of correcting the variables contained within the query (see my 2nd post)...


I do believe I have found an SQL bug as it occurs with a fresh install as well as with my modifcations. The bug is in the check_image_hash_db subroutine in Hashing.pm. In a nutshell, it doesn't update the hash table. The debug error is:

[6339] warn: DBD::Pg::db do failed: ERROR: syntax error at or near "check" at character 27 [6339] dbg: FuzzyOcr?: update hash set match = '1', check = '1167200229' where key = '255:255:255:255:173820::0:0:0:0:14680'

The line in question is 136 from my modified Hashing.pm file: $sql = qq(update $dbfile set match = '$next', check = '$now' where key = '$key');

Changed 3 years ago by Juan

FuzzyOcr? w/ PostgreSQL support (patched files)

  Changed 3 years ago by Juan

Hehe. I didn't notice the 'attachment' link at the top. Sooooo, I've uploaded the file.

Juan

  Changed 3 years ago by Juan

Nope. No bug on your part. Sorry for even bringing it up. It appears to pg related. i am trying to debug it. getting sleepy though... =)

  Changed 3 years ago by Juan

Hello,

I installed my patches on one production server last night and after testing it all last night and most of this morning on my laptop, I believe the patch works as expected with the following exceptions:

1. syslog (spamd) reports being unable to find DBD::mysql. Since I am using Postgre, I have no traces of MySQL installed. I am unable to get rid of the message. But regardless of the syslog message, all functionality seems normal (as expected).

Dec 27 11:58:32 zeke spamd[12464]: Can't locate DBD/mysql.pm in @INC (@INC contains: ../etc/mail/spamassassin ../lib /usr/lib/perl5/vendor_perl/5.8.8/i686-linux /usr/lib/perl5/vendor_perl/5.8.8 /etc/perl /usr/lib/perl5/vendor_perl /usr/lib/perl5/site_perl/5.8.8/i686-linux /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/5.8.8/i686-linux /usr/lib/perl5/5.8.8 /usr/local/lib/site_perl .) at /etc/mail/spamassassin/FuzzyOcr/Config.pm line 169.

2. When an exact match is detected, the database is not updated (might be Pg related by not sure why it would be since it is simple SQL syntax). I have verified that the image to be updated already exists in the database.

[24226] info: FuzzyOcr?: Found Score <28.5> for Exact Image Hash [24226] info: FuzzyOcr?: Matched [1] time(s). Prev match: 12 hrs. 48 sec. ago [24226] warn: DBD::Pg::db do failed: ERROR: syntax error at or near "check" at character 30 [24226] dbg: FuzzyOcr?: update hash set match = '1', check = '1167251106' where key = '255:255:255:255:173820::0:0:0:0:14680'

Regards,

Juan

  Changed 3 years ago by Juan

More info.....

To test, I removed check = '$now' and the hash table successfully updates the match field. I can't figure out why it doesn't work when check = '$now' is the syntax. I've tried different formats including use of the sprintf function.

On the DBD-mysql error.. I've been reading up and it appears this is normal for the eval statement. I didn't realize it allowed the script to continue with non-fatal errors when requirements aren't found. Unless I misunderstood....

Juan

Changed 3 years ago by Juan

newest PgSQL patchset

  Changed 3 years ago by Juan

It appears 'check' is a reserved word for PostgreSQL. I changed 'check' to 'last_seen' and now the DB updates as expected. I have uploaded the newest files, patch and sql schema...

Juan

So, it looks like the only thing left is to attempt to get rid of the eval info message in syslog...

follow-up: ↓ 13   Changed 3 years ago by Juan

Documentation on 'check':  http://www.postgresql.org/docs/8.1/interactive/ddl-constraints.html

Same applies to 7.x versions so FuzzyOcr? should be compatable with the Pg 7.x/8.x series.

in reply to: ↑ 12   Changed 3 years ago by anonymous

Replying to Juan:

Documentation on 'check':  http://www.postgresql.org/docs/8.1/interactive/ddl-constraints.html Same applies to 7.x versions so FuzzyOcr? should be compatable with the Pg 7.x/8.x series.

Can anyone get these attachments to work? I cannot get either to open.

in reply to: ↑ 2008   Changed 38 hours ago by advertising

Add/Change #34 (PostgreSQL support)

Author


E-mail address and user name can be saved in the Preferences.


Action
as assigned
 
Note: See TracTickets for help on using tickets.