Posts Tagged tips
Extending PostgreSQL with Python
One of the features I enjoy the most about PostgreSQL is the ability to write stored procedures in C, Perl, TCL, PgSQL, and yes… obviously also in Python. I’ve been using this feature since 7.4, so any recent version of PostgreSQL is pretty much guaranteed to support it, but you’ll need to have the pl/python procedural language contrib module installed. Once it’s installed, you can activate it for your current database using the following query:
CREATE PROCEDURAL LANGUAGE plpythonu;
Once the language bindings have been activated, you can start writing your stored procedures in python, however you should really read up on the following subjects first:
As an example, I’ve written a little SP in PL/Python to provide support for PCRE since the stock distribution of PostgreSQL only supports LIKE/SIMILAR, and POSIX Style Regular Expressions.
Let’s create our Python language binding, and create a standard text storage table:
-- Activate PL/Python CREATE PROCEDURAL LANGUAGE plpythonu; -- Create a plain text-storage table CREATE TABLE text_storage ( id serial NOT NULL, payload character varying(128), CONSTRAINT text_storage_pkey PRIMARY KEY (id) ) WITH (OIDS=FALSE); ALTER TABLE text_storage OWNER TO xavier; -- Let's throw in an index on the payload field for good measure CREATE INDEX txt_payload_idx ON text_storage USING btree (payload);
Let’s now populate our table with some junk data:
INSERT INTO text_storage (payload) VALUES ('hello, world'); INSERT INTO text_storage (payload) VALUES ('the quick brown fox, blah blah blah'); INSERT INTO text_storage (payload) VALUES ('PCREs in Postgres'); INSERT INTO text_storage (payload) VALUES ('All hail Python!'); INSERT INTO text_storage (payload) VALUES ('Hello, test data!'); INSERT INTO text_storage (payload) VALUES ('Python would like to say Hello!');
And now we can go ahead and create our Python SP itself:
CREATE OR REPLACE FUNCTION pcre(text, text) RETURNS INTEGER AS $BODY$import re regex = args[0] in_str = args[1] compiled = re.compile(regex) IF compiled.search(in_str): RETURN 1 ELSE: RETURN 0$BODY$ LANGUAGE 'plpythonu' VOLATILE COST 100;
As you can see, our PCRE matching system is extremely simple, yet pretty powerful. We import Python’s built-in re module, compile the specified regex argument, then attempt to match it against the other argument. Here’s a usage example on our test table:
SELECT id, payload FROM text_storage WHERE pcre('[H|h]ello', payload) = 1; id | payload ----+--------------------------------- 1 | hello, world 5 | Hello, test DATA! 6 | Python would LIKE TO say Hello! (3 rows)
As always, feel free to suggest any improvements.
Fixing custom sequences in PostgreSQL
PostgreSQL provides a mechanism called sequences, which I believe is extracted from ANSI-SQL92, though I’m too lazy to check, which are basically stateful counters that provide some helper functions. The primary use of sequences in PostgreSQL and Oracle, is to implement auto-incrementing counters as a table field.
PostgreSQL will automatically create a sequence when you use the “SERIAL” datatype for your field, and will take care of assigning the default value of your field as the result of the nextval() call on the sequence, so most of the time you don’t need to interact directly with sequences.
Where things can get hairy however, is when you back up your data using pg_dump or any other mechanism, and restore that data in a table that is already populated. A common scenario for example, is populating a table that already has some recent data, with some older data you’ve been storing in archival. The opposite is true if you are trying to archive data from a table to a backup database for example.
Whenever you manually have to provide a value for the field assigned to a sequence, you are pretty much guaranteed to break the sequence unless you take the time to nextval() your sequence until it is in sync. This is a real problem, as pg_dump does not include sequence synchronization in its output.
The quickest way to synchronize a sequence, based on my observations, is to run the following query, taking care to replace the name of the sequence [SEQ] and the name of the associated table [TABLE] and field [FIELD]:
SELECT SETVAL([SEQ], COALESCE((SELECT [FIELD] FROM [TABLE] ORDER BY [FIELD] DESC LIMIT 1), 0)+1)
This will fetch the highest value of [FIELD] in [TABLE], increment it by 1, and synchronize the sequence [SEQ] to the new value.
I’ve also written the following little Python script that will look for any non-system sequence in the specified database, and use this method to repair it. Let me know if it works out for you, or if you’d like to suggest some improvements.
Requirements: Python 2.5+, PsycoPG2 Python module (python-psycopg2)
#!/usr/bin/env python # -*- coding: utf-8 -*- # Standard imports import sys import os import time from optparse import OptionParser # psycopg2 import try: import psycopg2 except ImportError, e: print 'You must install the python module named "psycopg2" in order to use this module.' sys.exit(os.EX_SOFTWARE) class PgRepairman: def __init__(self, options, parser): self.options = options try: dsn = "dbname=%s host=%s user=%s" % (options.db, options.host, options.user) dsn += ("" != options.passwd) and ("password=%s" % options.password) or "" self.conn = psycopg2.connect(dsn) self.curs = self.conn.cursor() except Exception, e: print "ERROR - %s" % e sys.exit(1) # Returns a dict for a psycopg2 row object def _to_dict(self, desc, res): return dict(zip([x[0] for x in desc], res)) # Attempt to locate all custom sequences def findSequences(self): seq_query = """ SELECT pc1.relname AS seq, pc2.relname AS table, c.attname AS field FROM pg_depend, pg_class pc1, pg_class pc2, pg_attribute c WHERE pc1.oid = pg_depend.objid AND pc2.oid = pg_depend.refobjid AND c.attnum = pg_depend.refobjsubid AND c.attrelid = pc2.oid AND pc1.relkind = 'S' AND pc1.relname NOT LIKE 'pg_toast%%' """ try: self.print_verbose(seq_query) self.curs.execute(seq_query) except Exception, e: print "[ERROR] - %s" % e sys.exit(1) desc = self.curs.description for row in self.curs.fetchall(): yield self._to_dict(desc, row) # Increment the key value to the value of the sequence + 1 def fixSequences(self): for seq in self.findSequences(): print "Fixing sequence %s in table %s" % (seq['seq'], seq['table']) fix_query = "SELECT setval('%s', COALESCE((SELECT %s FROM %s ORDER BY %s DESC LIMIT 1), 0)+1)" % (seq['seq'], seq['field'], seq['table'], seq['field']) try: self.print_verbose(fix_query) self.curs.execute(fix_query) except Exception, e: print "[WARNING] - %s" % e pass def print_verbose(self, msg): if (True == self.options.verbose): print "[DEBUG] - %s" % msg if __name__=='__main__': usage = "Usage: %prog <options> [-v --verbose] [-u --username | -p --password \ -o --host | -d --database]" version = "%prog v1.0\nDistributed under the LGPL2 License" description = "Increments all sequences in a PostgreSQL database" parser = OptionParser(usage=usage, version=version, description=description) parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, help="Enable extra output") parser.add_option("-o", "--host", action="store", dest="host", default="127.0.0.1", help="Database hostname/IP") parser.add_option("-u", "--username", action="store", dest="user", default="postgres", help="Database Username") parser.add_option("-p", "--password", action="store", dest="passwd", default="", help="Database Password") parser.add_option("-d", "--database", action="store", dest="db", default="template1", help="Database Name") try: (options, args) = parser.parse_args() obj = PgRepairman(options, parser) obj.fixSequences() except KeyboardInterrupt, e: sys.exit(1)
You can also download the script here.
Apache2 shortcuts
I’ve been maintaining and managing Apache servers for over 10 years now, but for some reason, I never bothered to RTFM when it came to enabling/disabling modules and site configs in apache2.. As it turns out, you don’t have to manually create symlinks from mods_available to mods_enabled and sites_available to sites_enabled, as Apache2 includes a handful of shortcut scripts to do this work for you… Doh!
To enable a module in your apache2 config, instead of doing the old
ln -sf /etc/apache2/mods_available/mod_rewrite.conf /etc/apache2/mods_enabled/mod_rewrite.conf ln -sf /etc/apache2/mods_available/mod_rewrite.load /etc/apache2/mods_enabled/mod_rewrite.load
Next time, just do:
a2enmod rewrite && /etc/init.d/apache2 restart
To disable this module, try
a2dismod rewrite && /etc/init.d/apache2 restart
Similarly, to enable a site config, try
a2ensite myVhost.com && /etc/init.d/apache2 restart
and to disable it, obviously, try
a2dissite myVhost.com && /etc/init.d/apache2 restart
Finally, if you’d like to lint through your Apache2 config files before issuing a restart which might be responsible for some downtime if it doesn’t work, try
apache2ctl configtest
on Delicious
on Twitter