Penguicon 7.0

Our little gang from the Windsor Unix Users Group just got back from Penguicon 7.0 on Sunday night. As always, it was so much fun we hardly managed to get any sleep, and we all came home with a lot of really fun memories.

I didn’t get around to sit down and come up with a write-up about the experience as i’ve been working on other projects every evening this week, but I finally have a minute to do so now.

As far as the convention event itself went, there is definitely a lot of good and a few notable logistical oversights that are worth mentioning, but i’m not really interested in addressing either on here, as I’m really more interested in sharing my impression of the technical panels I’ve attended.

I have to say, I’m very excited about every single one of the panels I’ve been able to attend this year. They were all highly informative, and the speakers very motivated and passionate about their material. Here’s a short summary of the events I was especially excited about:

  • Neural Networks
    This panel was held by Dr. Stanley C. Mortel. It explained the basic concepts behind the idea of building and training neural computation networks. It was a very abstract fly-by course, which I feel is a very appropriate way to introduce this type of material. There was a second part of this panel available the next day but we weren’t fortunate enough to attend it.
  • Reading by Will Wheaton
    Just kidding ;)
  • Beginning PyGame Programming
    This tutorial by Craig Maloney was my first real introduction to PyGame. Craig had a nice little demo environment all set up and ready for the presentation. He flew pretty quickly through many of the concepts behind PyGame while writing a Pong demo. Although he went through the material pretty quickly, I’m very interested in learning more about the platform as a result, so it’s safe to say the tutorial was a success as far as I’m concerned.
  • Open Hardware with Arduino
    The speaker for this talk was W Craig Trader, and I have to say Craig was not only extremely knowledgeable about the Arduino (and obviously several other) platform, but also pretty excited about it, and since microcontrolers isn’t something I’ve ever bothered to learn anything about, I really didn’t expect to get so excited about the talk. The Arduino platform appears to be very accessible technically and financially, and also pretty powerful. Craig did an amazing job showing us the strengths and weaknesses of the platform and getting our whole group pretty excited to play with it!
  • High Performance PHP
    This talk held by Rasmus Lerdorf, the creator of PHP and an infrastructure architect at Yahoo! Inc., took us through several performance optimization techniques for PHP apps, although many of the concepts featured in the talk could easily be applied to any apache-based app. It was very refreshing to finally see someone as experienced and well-rounded as him go through the tribulations of identifying and addressing performance bottlenecks in PHP apps. I was very interested in both the individual techniques highlighted during the talk, as well as the problem-solving process of a highly experienced software engineer, so that talk was the highlight of the con for me.

Unfortunately, due to very unfortunate logistical shortcomings, our group was unable to attend a lot of the panels we were looking forward to check out, but despite this, the con was a resounding success in every aspect you can think of. I’ve met some really cool and interesting people, learned a lot of very exciting stuff that will be guiding some of my personal research for months to come, learned some technical concepts that will directly impact my work performance, and had way too much fun.

The next stop on our list will probably be PyOhio. We had a chance to chat with Catherine Devlin, a fellow Pythonista & Oracle geek from IntelliTech Systems, who told us about it, and Aaron and I are looking into putting together a talk proposal for the con, if we can come up with it before the deadline, and we already have some pretty interesting ideas, so I’m looking forward to it. I’d also really love to attend PyCon 2010 in Atlanta!

If you’re interested in attending a convention where Linux and FOSS enthusiasts get a chance to have fun with the Sci-Fi crowd for a weekend of fun, PenguiCon is definitely for you!!

Extending PostgreSQL with Python

One of the features I enjoy the most about PostgreSQL is the ability to write stored procedures in C, Perl, TCL, PgSQL, and yes… obviously also in Python. I’ve been using this feature since 7.4, so any recent version of PostgreSQL is pretty much guaranteed to support it, but you’ll need to have the pl/python procedural language contrib module installed. Once it’s installed, you can activate it for your current database using the following query:

CREATE PROCEDURAL LANGUAGE plpythonu;

Once the language bindings have been activated, you can start writing your stored procedures in python, however you should really read up on the following subjects first:

As an example, I’ve written a little SP in PL/Python to provide support for PCRE since the stock distribution of PostgreSQL only supports LIKE/SIMILAR, and POSIX Style Regular Expressions.

Let’s create our Python language binding, and create a standard text storage table:

-- Activate PL/Python
CREATE PROCEDURAL LANGUAGE plpythonu;
 
-- Create a plain text-storage table
CREATE TABLE text_storage
(
  id serial NOT NULL,
  payload CHARACTER VARYING(128),
  CONSTRAINT text_storage_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE text_storage OWNER TO xavier;
 
-- Let's throw in an index on the payload field for good measure
CREATE INDEX txt_payload_idx
  ON text_storage
  USING btree
  (payload);

Let’s now populate our table with some junk data:

INSERT INTO text_storage (payload) VALUES ('hello, world');
INSERT INTO text_storage (payload) VALUES ('the quick brown fox, blah blah blah');
INSERT INTO text_storage (payload) VALUES ('PCREs in Postgres');
INSERT INTO text_storage (payload) VALUES ('All hail Python!');
INSERT INTO text_storage (payload) VALUES ('Hello, test data!');
INSERT INTO text_storage (payload) VALUES ('Python would like to say Hello!');

And now we can go ahead and create our Python SP itself:

CREATE OR REPLACE FUNCTION pcre(text, text)
  RETURNS INTEGER AS
$BODY$import re
 
regex  = args[0]
in_str = args[1]
 
compiled = re.compile(regex)
 
IF compiled.search(in_str):
	RETURN 1
ELSE:
	RETURN 0$BODY$
  LANGUAGE 'plpythonu' VOLATILE
  COST 100;

As you can see, our PCRE matching system is extremely simple, yet pretty powerful. We import Python’s built-in re module, compile the specified regex argument, then attempt to match it against the other argument. Here’s a usage example on our test table:

SELECT id, payload FROM text_storage WHERE pcre('[H|h]ello', payload) = 1;
 id |             payload             
----+---------------------------------
  1 | hello, world
  5 | Hello, test DATA!
  6 | Python would LIKE TO say Hello!
(3 ROWS)

As always, feel free to suggest any improvements.

Fixing custom sequences in PostgreSQL

PostgreSQL provides a mechanism called sequences, which I believe is extracted from ANSI-SQL92, though I’m too lazy to check, which are basically stateful counters that provide some helper functions. The primary use of sequences in PostgreSQL and Oracle, is to implement auto-incrementing counters as a table field.

PostgreSQL will automatically create a sequence when you use the “SERIAL” datatype for your field, and will take care of assigning the default value of your field as the result of the nextval() call on the sequence, so most of the time you don’t need to interact directly with sequences.

Where things can get hairy however, is when you back up your data using pg_dump or any other mechanism, and restore that data in a table that is already populated. A common scenario for example, is populating a table that already has some recent data, with some older data you’ve been storing in archival. The opposite is true if you are trying to archive data from a table to a backup database for example.

Whenever you manually have to provide a value for the field assigned to a sequence, you are pretty much guaranteed to break the sequence unless you take the time to nextval() your sequence until it is in sync. This is a real problem, as pg_dump does not include sequence synchronization in its output.

The quickest way to synchronize a sequence, based on my observations, is to run the following query, taking care to replace the name of the sequence [SEQ] and the name of the associated table [TABLE] and field [FIELD]:

SELECT SETVAL([SEQ], COALESCE((SELECT [FIELD] FROM [TABLE] ORDER BY [FIELD] DESC LIMIT 1), 0)+1)

This will fetch the highest value of [FIELD] in [TABLE], increment it by 1, and synchronize the sequence [SEQ] to the new value.

I’ve also written the following little Python script that will look for any non-system sequence in the specified database, and use this method to repair it. Let me know if it works out for you, or if you’d like to suggest some improvements.

Requirements: Python 2.5+, PsycoPG2 Python module (python-psycopg2)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
 
# Standard imports
import sys
import os
import time
from optparse import OptionParser
 
# psycopg2 import
try:
    import psycopg2
except ImportError, e:
    print 'You must install the python module named "psycopg2" in order to use this module.'
    sys.exit(os.EX_SOFTWARE)
 
 
 
class PgRepairman:
    def __init__(self, options, parser):
        self.options = options
        try:
            dsn = "dbname=%s host=%s user=%s" % (options.db, options.host, options.user)
            dsn += ("" != options.passwd) and ("password=%s" % options.password) or ""
            self.conn = psycopg2.connect(dsn)
            self.curs = self.conn.cursor()
        except Exception, e:
            print "ERROR - %s" % e
            sys.exit(1)
 
 
    # Returns a dict for a psycopg2 row object
    def _to_dict(self, desc, res):
        return dict(zip([x[0] for x in desc], res))
 
 
    # Attempt to locate all custom sequences
    def findSequences(self):
        seq_query = """
        SELECT pc1.relname AS seq, pc2.relname AS table, c.attname AS field 
        FROM pg_depend, pg_class pc1, pg_class pc2, pg_attribute c 
        WHERE pc1.oid = pg_depend.objid 
            AND pc2.oid = pg_depend.refobjid 
            AND c.attnum = pg_depend.refobjsubid 
            AND c.attrelid = pc2.oid 
            AND pc1.relkind = 'S' 
            AND pc1.relname NOT LIKE 'pg_toast%%'
        """
 
        try:
            self.print_verbose(seq_query)
            self.curs.execute(seq_query)
        except Exception, e:
            print "[ERROR] - %s" % e
            sys.exit(1)
 
        desc = self.curs.description
        for row in self.curs.fetchall():
            yield self._to_dict(desc, row)
 
 
    # Increment the key value to the value of the sequence + 1
    def fixSequences(self):
        for seq in self.findSequences():
            print "Fixing sequence %s in table %s" % (seq['seq'], seq['table'])
            fix_query = "SELECT setval('%s', COALESCE((SELECT %s FROM %s ORDER BY %s DESC LIMIT 1), 0)+1)" % (seq['seq'], seq['field'], seq['table'], seq['field'])
            try:
                self.print_verbose(fix_query)
                self.curs.execute(fix_query)
            except Exception, e:
                print "[WARNING] - %s" % e
                pass
 
    def print_verbose(self, msg):
        if (True == self.options.verbose):
            print "[DEBUG] - %s" % msg
 
if __name__=='__main__':
    usage       = "Usage: %prog <options> [-v --verbose] [-u --username | -p --password \ -o --host | -d --database]"
    version     = "%prog v1.0\nDistributed under the LGPL2 License"
    description = "Increments all sequences in a PostgreSQL database"
    parser = OptionParser(usage=usage, version=version, description=description)
    parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, help="Enable extra output")
    parser.add_option("-o", "--host", action="store", dest="host", default="127.0.0.1", help="Database hostname/IP")
    parser.add_option("-u", "--username", action="store", dest="user", default="postgres", help="Database Username")
    parser.add_option("-p", "--password", action="store", dest="passwd", default="", help="Database Password")
    parser.add_option("-d", "--database", action="store", dest="db", default="template1", help="Database Name")
 
 
    try:
        (options, args) = parser.parse_args()
        obj = PgRepairman(options, parser)
        obj.fixSequences()
    except KeyboardInterrupt, e:
        sys.exit(1)

You can also download the script here.

Apache2 shortcuts

I’ve been maintaining and managing Apache servers for over 10 years now, but for some reason, I never bothered to RTFM when it came to enabling/disabling modules and site configs in apache2.. As it turns out, you don’t have to manually create symlinks from mods_available to mods_enabled and sites_available to sites_enabled, as Apache2 includes a handful of shortcut scripts to do this work for you… Doh!

To enable a module in your apache2 config, instead of doing the old

ln -sf /etc/apache2/mods_available/mod_rewrite.conf /etc/apache2/mods_enabled/mod_rewrite.conf
ln -sf /etc/apache2/mods_available/mod_rewrite.load /etc/apache2/mods_enabled/mod_rewrite.load

Next time, just do:

a2enmod rewrite && /etc/init.d/apache2 restart

To disable this module, try

a2dismod rewrite && /etc/init.d/apache2 restart

Similarly, to enable a site config, try

a2ensite myVhost.com && /etc/init.d/apache2 restart

and to disable it, obviously, try

a2dissite myVhost.com && /etc/init.d/apache2 restart

Finally, if you’d like to lint through your Apache2 config files before issuing a restart which might be responsible for some downtime if it doesn’t work, try

apache2ctl configtest

Ahh those cool little CLI tools…

The list of Unix/Linux utilities available grows every day. Here’s a little list of cherry-picked utilities i’ve found myself using more and more lately…

Inotail:

Inotail uses the Linux kernel’s inotify API, which was implemented with v2.6.13 to monitor changes to files on the filesystem. This design is more efficient than our beloved tail, which relies on polling the monitored file for changes every second. Example: To monitor in real-time syslog entries, try:

inotail -f /var/log/messages

The documentation for inotail, if you need it, can be found at http://distanz.ch/inotail/

Incron:

Incron is an event-scheduler similar to cron, except that it is based on file-system events as opposed to our beloved time-based cron daemons. It is also based on the inotify subsystem, which means it is only available on Linux as far as I know. Let’s set up a quick example to demonstrate the stuff you can do with incron. We’re going to install incron, and configure it to automatically create a thumbnail of any picture dropped in a specified directory using ImageMagick’s convert utility, on a stock Ubuntu Linux system:

# Package installation
aptitude install incron imagemagick
# Add your user account to the list of allowed incron users (replace xavier by your account)
sudo sh -c "echo xavier >> /etc/incron.allow"
# Create our directory structure
mkdir -p /home/xavier/images/original
mkdir /home/xavier/images/thumb
# Edit the actual incron file
incrontab --edit

The editor will now fire up. Enter the following lines in the editor, and exit:

# Convert /home/xavier/images/original/test.png to /home/xavier/images/thumb/test.png
/home/xavier/images/original/   IN_CLOSE_WRITE    convert -thumbnail 320x320 $@/$# $@/../thumb/$#
# When an original is deleted, automatically clean up the associated thumbnail
/home/xavier/images/original/   IN_DELETE    rm -rf $@/../thumb/$#

We’re all done. Any new image dropped in /home/xavier/images/original/ will automatically be converted into a thumbnail of the same name in /home/xavier/images/thumb/. There are many things you can do with incron, so i suggest you check out the following links:

ccze

ccze is simply a logfile syntax highlighter for various file-formats commonly found on unix systems, such as syslog, apache logs, dmesg, etc… You can have it syntax-highlight a file in your terminal by using the following syntax:

ccze < /var/log/messages

Or you can pipe anything onto ccze to have it stream syntax-coloured output on your terminal. For example:

inotail -f /var/log/messages | ccze -A

Additionally, ccze can also output syntax-coloured text in HTML. For example, the following command:

dmesg | grep -i cpu | ccze -m html

Woud output the following document: ccze Output