Fuzzy Mathematics with FuzzPy (Part 2)

In the first part of this post, we explored the foundations of fuzzy sets and fuzzy graphs, and discussed how you can use FuzzPy, a general fuzzy mathematics library for python, to work with these types. Today, we will expand a bit on those foundations by learning about fuzzy numbers, and creating visualizations with FuzzPy.

Fuzzy Numbers

Think of a fuzzy number as a number whose actual value is uncertain. We do not have its value, but we have a set of estimations, each with an associated degree of likelihood, or grade. If you remember the definition of a fuzzy set, you will remember that the structure of a fuzzy set is very similar to what I just described, so it only makes sense to use a fuzzy set to represent what is known about a fuzzy number.

Effectively, a fuzzy number is a fuzzy subset of the real line \mathbb{R}, where the value of each member of the set is an estimated value of the fuzzy number, and the \mu-value of the member is its grade.

triangular_fuzzy_number

Triangular Fuzzy Number

One can represent the estimated values as well as their associated \mu-values in the form of a Cartesian coordinate system, with the Y-axis representing \mu-values, and the X-axis representing estimated values.
We can then see that certain patterns emerge in the shape of the visualization, and these patterns are used to distinguish between the different types of fuzzy numbers:

  • Polygonal sets are represented by an arbitrary set of segments on the plane.
  • Trapezoidal sets contain two kernels (estimations whose \mu-value is exactly 1.0), and two X-intercepts.
  • Triangular sets contain exactly one kernel, and two X-intercepts
  • Gaussian sets have \mu-values determined by a normal distribution, thus providing a smoother distribution.

There are a few additional types as well, but these are the four basic types, and are all supported out of the box by FuzzPy.

Because the points of interest in each type are different, instanciating a new FuzzyNumber in FuzzPy is done differently depending on the type. Let’s instantiate objects for each of these types using FuzzPy, and we’ll play a bit with these objects in a moment. Make sure you have FuzzPy installed first (easy_install -U fuzzpy) and save the following code in fuzzynums.py:

import fuzz
 
# Instantiating a PolynomialFuzzyNumber is done by providing
# a list of (value, mu) tuple to the constructor:
polygonal = [fuzz.PolygonalFuzzyNumber([(0.0, 0.0), \
    (0.25, 1.0), (1.0, 0.5), (1.3, 1.0), (1.8, 0.0)])]
 
polygonal += [fuzz.PolygonalFuzzyNumber([(0.0, 0.0), \
    (1.0, 0.5), (3.0, 1.0), (4.0, 1.0), (6.0, 0.0)])]
 
# A trapezoidal needs to be given a tuple containing the values
# of the kernels (mu = 1.0) and a tuple containing the values
# of the support (mu = 0.0)
trapezoidal = [fuzz.TrapezoidalFuzzyNumber((1.0, 2.0), \
    (0.3, 3.0))]
 
trapezoidal += [fuzz.TrapezoidalFuzzyNumber((1.0, 6.0), \
    (0.5, 10.0))]
 
# A triangular set needs to be provided the value of the
# kernel, and a tuple containing the values of the support.
triangular = [fuzz.TriangularFuzzyNumber(1.0, (0.0, 3.0))]
triangular += [fuzz.TriangularFuzzyNumber(4.0, (0.0, 4.5))]
 
# Gaussian fuzzy sets simply need the mean, and standard
# derivation for the estimated value.
gaussian = [fuzz.GaussianFuzzyNumber(1.0, 0.2)]
gaussian += [fuzz.GaussianFuzzyNumber(12.0, 1.0)]

We can now fire up the python interpreter and work with our objects:

>>> # Polygonal Union:
... print polygonal[0] | polygonal[1]
PolygonalFuzzyNumber: kernel [(0.25, 0.25), (1.3, 1.3), \
(3.0, 4.0)], support [(0.0, 6.0)]
 
>>> # Polygonal Intersection:
... print polygonal[0] & polygonal[1]
PolygonalFuzzyNumber: kernel [], support [(0.0, 6.0)]
 
>>> # Trapezoidal Addition:
... print trapezoidal[0] + trapezoidal[1]
TrapezoidalFuzzyNumber: kernel (2.0, 8.0), support (0.8000000\
0000000004, 13.0)
 
>>> # Gaussian Addition
... print gaussian[0] + gaussian[1]
GaussianFuzzyNumber: kernel (13.0, 13.0), support (4.85663149\
07018668, 21.143368509298135)

The class inheritance chain for FuzzPy’s FuzzyNumber derived classes looks like this:

fuzzy_number_class_inheritance.png

Fuzzy Number Class Inheritance

Visualizations

FuzzPy ships with a small visualization framework that is able to create visualizations for almost all FuzzPy types, and encode them in a variety of formats. In most cases, creating and storing a visualization is a trivial operation:

from fuzzynums import *
from fuzz.visualization import VisManager
 
gaussian = fuzz.GaussianFuzzyNumber(1.0, 0.2)
plugin = VisManager.create_backend(gaussian)
(vis_format, vis_data) = plugin.visualize()
 
with (open("visualization.%s" % vis_format, "wb")) as fp:
    fp.write(vis_data)

In this example, we passed our FuzzyNumber instance to the VisManager.create_backend() method, which is a plugin factory method, and it returned the first plugin it could find that can be used to create a visualization of our FuzzyNumber.

The plugin’s visualize() method was called, and returned a tuple containing the file format of the visualization, and its payload. We then created a file using the returned format as extension, and stored the payload in that file.

You can also force the VisManager object to use a certain plugin if you do not want to rely on the default plugin picked for the type of object to visualize:

plugin = VisManager.create_backend(gaussian, 'num_gnuplot')

Similarly, a plugin’s visualize() method can accept arguments to customize the final output. For example, the following code would tell the plugin to use the ‘gif’ image format:

(vis_format, vis_data) = plugin.visualize(format="gif")

You will need to have Graphviz installed for graph visualizations, and Gnuplot for fuzzy number visualizations.

Now that you know how to use all the FuzzPy types and how to create visualizations for your data, you can start using FuzzPy in your own project, or get involved in the development process. Check out the project’s GitHub page for more information.

fuzzy_digraph

Fuzzy Digraph Visualization

polygonal_fuzzy_number

Polygonal Fuzzy Number

Fuzzy Mathematics with FuzzPy (Part 1)

FuzzPy is a python library that exposes specialized datatypes to deal with a wide range of fuzzy number types, fuzzy sets, and fuzzy graphs. Binary operations against each type, such as set unions and intersections, can be performed using some of python’s native binary operators (|, &, ==), or specialized methods if you wish to deviate from the default functions for these computations.

FuzzPy also provides several helper methods such as kernel and neighbors isolation, alpha-cuts, cardinality testing, shortest-path computation, and minimum spanning trees. A powerful visualization framework also allows you to quickly create and save visualizations for your data.

In this post, we will examine fuzzy sets and fuzzy graphs, and see how one can use FuzzPy to work with these types, providing examples for each. In the next post of this series, we will examine the different types of fuzzy numbers, generate visualizations, and explore advanced concepts such as automatic fuzzification, and importing graph data from fuzzy adjacency matrices.

Fuzzy Sets

A fuzzy set is simply a set in which each element has an associated degree of membership into the set. This membership value is called \mu-value of the element. A \mu-value of zero indicates that the element does not belong to the set, while any value between zero (exclusive) and one (inclusive), indicates that the element is, to a certain degree, a set member.

Creating a fuzzy set with FuzzPy is just a matter of creating a new FuzzySet object, then either calling the add(FuzzyElement(value, mu)) method for each element to import, or using the update(elements) method, providing it with a list of FuzzyElement(value, \mu-value) tuples for bulk import.

from fuzz import FuzzySet, FuzzyElement
 
# Create two lists of FuzzyElements
elements_a = [(1, 0.3), (2, 0.5), (3, 0.7), (4, 0.9)]
elements_b = [(3, 0.8), (6, 0.6), (1, 0.4), (8, 0.2)]
 
elements_a = [FuzzyElement(x[0], x[1]) for x in elements_a]
elements_b = [FuzzyElement(x[0], x[1]) for x in elements_b]
 
# Create two fuzzy sets
set_a = FuzzySet()
set_b = FuzzySet()
 
set_a.update(elements_a)
set_b.update(elements_b)

We’ve just created a couple fuzzy sets with four elements each. Notice that two of the elements are common to both set_a and set_b (albeit with different \mu-values). We can now perform several operations against these sample sets. Save the code above in a file called example.py and fire up the python interpreter to test fuzzy set functionality against our sets:

>>> from example import *
>>> # Set Difference
...
>>> set_a - set_b
FuzzySet([FuzzyElement(2, 0.500000), FuzzyElement(4, 0.900000)])
>>> set_b - set_a
FuzzySet([FuzzyElement(8, 0.200000), FuzzyElement(6, 0.600000)])
# Set Union
...
>>> set_a | set_b
FuzzySet([FuzzyElement(1, 0.400000), FuzzyElement(2, 0.500000), FuzzyElement(3, 0.800000), FuzzyElement(4, 0.900000), FuzzyElement(6, 0.600000), FuzzyElement(8, 0.200000)])
# Set Intersection
...
>>> set_a & set_b
FuzzySet([FuzzyElement(1, 0.300000), FuzzyElement(3, 0.700000)])

Fuzzy graphs

In a fuzzy graph or digraph, each vertice and each edge can be associated with a \mu-value. The \mu-value of a vertice indicates the vertice’s degree of membership into the graph, while an edge or arc’s \mu-value indicates the degree of connectivity between the head and tail vertices.

The creation of fuzzy graphs is also pretty straightforward: create a set or fuzzy set (see above), then feed that set to the fuzz.Graph constructor, along with the directed boolean value to specify whether you’d like to create a graph or a digraph. You can then use the fuzz.Graph.connect(head, tail, mu) method to add edges to the graph:

import fuzz
 
# Create two sets of vertices
set_a = set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
set_b = set([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
 
# Create our graph objects
graph_a = fuzz.FuzzyGraph(viter=set_a, directed=False)
graph_b = fuzz.FuzzyGraph(viter=set_b, directed=True)
 
# Create some arcs in graph_a
graph_a.connect(1, 3, 0.5)
graph_a.connect(3, 6, 0.06)
graph_a.connect(2, 8, 0.2)
graph_a.connect(4, 1, 0.9)
 
# And some in graph_b
graph_b.connect(11, 19, 0.9)
graph_b.connect(14, 12, 0.3)
graph_b.connect(15, 14, 0.5)

Save the code above in a file named graphs.py and fire up your python interpreter so we can experiment with graph operations:

>>> from graphs import *
>>> # Retrieve the mu value of the edge between 1 and 3
...
>>> graph_a.mu(3, 1)
0.5
>>> # Retrieve an alpha-cut of graph_b against mu-value of 0.5
...
>>> graph_b.alpha(0.5)
Graph(viter = set([11, 12, 13, 14, 15, 16, 17, 18, 19, 20]), eiter = set([(15, 14), (11, 19)]), directed = True)
>>> # Is graph_a a subgraph of graph_b? (no)
... graph_a.issubgraph(graph_b)
False
>>> # Detect adjacent (connected) vertices
...
>>> graph_a.adjacent(1, 2)
False
>>> graph_a.adjacent(1, 3)
True
>>> # Find all the neighbours of the vertex with value 3
...
>>> graph_a.neighbors(3)
set([1, 6])
>>> # Find the shortest path to all vertices using 6 as origin
...
>>> graph_a.dijkstra(6)
{1: 3, 2: None, 3: 6, 4: 1, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None}
>>> # Find the minimum spanning tree of graph_a
...
>>> graph_a.minimum_spanning_tree()
Graph(viter = set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), eiter = set([(2, 8), (1, 3), (4, 1), (3, 6)]), directed = False)

There are a few more operations available for both crisp and fuzzy graphs/digraphs. You should consult the API doc for a comprehensive list.

You should also consult the examples provided as part of the module on GitHub to see FuzzPy in action.

Algorithms in Python: Binary exponentiation

The typical approach to exponentiation of a base b by an exponent n is to repeatedly multiply the base by itself, as such: b^n = \prod_{i=1}^{n} b

This is easy to compute for small values but quickly chokes with very large values.

A faster approach involves converting the exponent into base 2, then multiplying the running total and squaring the base each time a 1-bit is encountered. The python implementation looks like this:

def binary_exponent(base, exponent):
    """\
    Binary Exponentiation
 
    Instead of computing the exponentiation in the traditional way,
    convert the exponent to its reverse binary representation.
 
    Each time a 1-bit is encountered, we multiply the running total by
    the base, then square the base.
    """
    # Convert n to base 2, then reverse it.
    exponent = bin(exponent)[2:][::-1]
 
    result = 1
    for i in range(0, len(exponent)):
        if exponent[i] == '1':
            result *= base
        base *= base
    return result

Similarly, the same approach can be taken to perform modular exponentiation against very large exponents:

def modular_exponent(base, exponent, mod):
    """\
    Modular exponentiation through binary decomposition.
 
    We use the same technique as for the binary exponentiation above in
    order to find the modulo of our very large exponent and an arbitrary
    integer mod.
    """
    exponent = bin(exponent)[2:][::-1]
 
    x = 1
    power = base % mod
    for i in range(0, len(exponent)):
        if exponent[i] == '1':
            x = (x * power) % mod
        power = (power ** 2) % mod
    return x

The optimization is especially visible for the modulo operation, since we are working on smaller numbers with each iteration, whereas for simple exponentiation, the CPython interpreter can already optimize the operation.
The speedup for modular exponentiation on my system was in the order of 72x the speed of a simple (a^n) \mod{m} whereas the the computation of the exponentiation took about the same time in both cases.

Algorithms in Python: Binary Operations

Today I’d like to demonstrate a simple implementation of Kenneth Rosen‘s binary addition and multiplication algorithms as outlined in "Discrete Mathematics and its Applications". Both algorithms are very simple and work the same way we’ve all learned to do decimal additions and multiplications by hand in grade-school.

As usual, I’ve also provided a unit-test suite for each algorithm.

Please note: This is only intended to demonstrate the algorithms. If you are actually trying to do some real work with binary numbers in Python, note that the language itself provides a highly optimized implementation of binary numbers and related operations. Here are a few examples:

# Decimal to binary conversion:
>>> bin(15)
'0b1111'
 
# Binary to decimal conversion:
>>> str(0b1111)
'15'
 
# Binary addition:
>>> bin(0b1111 + 0b10011)
'0b100010'
 
# Binary multiplication:
>>> bin(0b1111 * 0b10011)
'0b100011101'

Binary Addition

We will use strings in this example to represent sequences of bits. The algorithm operates right-to-left, maintaining a carry each time it adds two 1-valued bits.

Code:

"""Simple binary operation algorithm."""
 
import math
 
# a and b should be string representation of the binary components
# You could easily extend this to overload the + operator for binaries,
# but this is already built into python.
def binary_addition(bin_a, bin_b):
    """Performs a binary addition against two provided binary numbers"""
 
    # Pad the components so they are of equal length
    max_length = max(len(bin_a), len(bin_b))
    bin_a, bin_b = bin_a.zfill(max_length), bin_b.zfill(max_length)
 
 
    carry = 0
    result = ''
 
    # Note that we work from right to left
    for i in range(0, max_length)[::-1]:
        tmp = int(math.floor((int(bin_a[i]) + int(bin_b[i]) + carry)/2))
        res = str(int(bin_a[i]) + int(bin_b[i]) + carry - 2*tmp)
        result += res
        carry = tmp
 
    result = (result + str(carry))[::-1]
    try:
        return result[result.index('1'):]
    except ValueError, ex:
        return '0'

Unit Test:

"""Unit test for binary_addition.py"""
 
import unittest
from binary_addition import binary_addition
 
class TestBinaryAddition(unittest.TestCase):
 
    """Comparing python binary addition against our own"""
 
    def test_add_same_length(self):
 
        """Adding 2 binary numbers of the same length"""
 
        bin_a = '001001101'
        bin_b = '011011010'
 
        self._compare_additions(bin_a, bin_b)
 
    def test_add_b_larger(self):
 
        """ B has more digits than A """
 
        bin_a = '01000101'
        bin_b = '1110010110101011101101011101000101'
 
        self._compare_additions(bin_a, bin_b)
 
    def test_add_b_smaller(self):
 
        """B has less digits than A"""
 
 
        bin_a = '1110001001001001001101101011000101'
        bin_b = '0110010'
 
        self._compare_additions(bin_a, bin_b)
 
    def _compare_additions(self, bin_a, bin_b):
 
        """Compares the python implementation against ours"""
 
        bin_add = binary_addition(bin_a, bin_b)
        py_add = bin(eval("0b%s" % bin_a) + eval("0b%s" % bin_b))
 
        algo_res = bin_add[bin_add.index('1'):]
        py_res = py_add[py_add.index('1'):]
 
        self.assertEqual(py_res, algo_res)
 
 
 
if '__main__' == __name__:
    unittest.main()

Binary Multiplication

We are still using strings to represent bit sequences. We are still working from right to left, shifting to the left every number from the first number that needs to be multiplied by 1 in the second number. We then append the result to a list that we sum at the end, using the previous algorithm.

Code:

"""Simple Binary multiplication algorithm"""
 
 
import sys
import os.path
 
from binary_addition import binary_addition
 
 
def binary_multiplication(bin_a, bin_b):
    """Multiplies two binary numbers by using python lists to
    easily shift digits around"""
 
    temp_result = []
    result = "0"
 
    # Remove any unnecessary padding
    bin_a = bin_a[bin_a.index('1'):]
    bin_b = bin_b[bin_b.index('1'):]
 
    for i in range(0, len(bin_b))[::-1]:
        if bin_b[i] == '1':
            temp_result.append("%s%s" % (bin_a, "0" * (len(bin_b)-i-1)))
        else:
            temp_result.append("0")
 
 
    for val in temp_result:
        result = binary_addition(str(result), str(val))
 
    return result

Unit Test:

"""Unit test for binary_addition.py"""
 
import unittest
from binary_multiplication import binary_multiplication
 
class TestBinaryAddition(unittest.TestCase):
 
    """Comparing python binary multiplication against our own"""
 
    def test_add_same_length(self):
 
        """Multiplying 2 binary numbers of the same length"""
 
        bin_a = '001001101'
        bin_b = '011011010'
 
        self._compare_multiplications(bin_a, bin_b)
 
    def test_simple(self):
 
        """Multiplying 2 simple binary numbers"""
        bin_a = '110'
        bin_b = '101'
 
        self._compare_multiplications(bin_a, bin_b)
 
    def test_add_b_larger(self):
 
        """B has more digits than A"""
 
        bin_a = '010'
        bin_b = '111001011000101'
 
        self._compare_multiplications(bin_a, bin_b)
 
    def test_add_b_smaller(self):
 
        """ B has less digits than A """
 
        bin_a = '111001011000101'
        bin_b = '010'
 
        self._compare_multiplications(bin_a, bin_b)
 
    def _compare_multiplications(self, bin_a, bin_b):
 
        """Compares the python implementation against ours"""
 
        bin_mult = binary_multiplication(bin_a, bin_b)
        py_mult = bin(eval("0b%s" % bin_a) * eval("0b%s" % bin_b))
 
        algo_res = bin_mult[bin_mult.index('1'):]
        py_res = py_mult[py_mult.index('1'):]
 
        self.assertEqual(py_res, algo_res)
 
 
 
if '__main__' == __name__:
    unittest.main()

Algorithms in Python: Base Expansion

I felt it would be helpful to folks interested in Python and studying algorithms, to review some commonly studied algorithms in Computer Science by providing a small description and a Python implementation of each algorithm.

This week, we’ll cover an introductory algorithm for converting from one numeral base to another. Here is the python code:

"""Simple implementation of a base expansion algorithm"""
 
import math
 
def base_expand(base, val):
    """This simple function performs a base-expansion from decimal
    using moduli and a translation table. The translation table is
    a clear limitation here, in that it implies the maximum base
    is 36."""
 
    if (base < 2) or (base > 36):
        raise BaseOutOfBoundsError(base)
 
    trans_table = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    res = ''
 
    while val != 0:
        res += trans_table[(int(val % base))]
        val = math.floor(val / base)
    return res[::-1]
 
class BaseOutOfBoundsError(Exception):
    """Base must be between 2 and 36"""
    def __init__(self, val):
        self.val = val
    def __str__(self):
        print "\nInvalid base: %s. Base must be (x | x > 1; x < 37)" % \
            self.val

Here is a unit-test for base_expand.py:

"""Unit tests for base_expansion.py"""
 
import unittest
from base_expansion import base_expand, BaseOutOfBoundsError
 
# Unit tests for base_expansion.py
class TestVals(unittest.TestCase):
    """Test suite"""
 
    def test_known_values(self):
        """Testing against known values"""
 
        vals = [{'val': 8, 'base': 2, 'expect': '1000'},
            {'val': 915652, 'base': 16, 'expect': 'DF8C4'},
            {'val': 256, 'base': 10, 'expect': '256'},
            {'val': 88189, 'base': 8, 'expect': '254175'}]
 
        for val in vals:
            self.assertEqual(val['expect'], \
                base_expand(val['base'], val['val']))
 
    def test_invalid_base(self):
        """ Testing invalid bases """
 
        bases = [1, 37, 50, 100]
        for base in bases:
            self.assertRaises(BaseOutOfBoundsError, base_expand, \
                base, 1)
 
 
 
if '__main__' == __name__:
    unittest.main()

Django Context Processors Best Practice

In this post, i’ll show you a simple trick to ensure your Django context-processors don’t break when Django 1.2 (and possibly future releases as well) is released.

If you’re not familiar with the concept of context-processors, I’ll also explain what those are and how they work, so don’t panic. Finally, i’ll also spend a minute explaining why I consider this to be best-practice.

What are template context processors?

Django’s context processors are a facility that allows you to provide data and callbacks to your templates.

You can do so in one of two ways:

  1. On an individual request basis: by passing a custom Context value to your render_to_response() call
  2. Globally: by creating a context processor method that accepts a HttpRequest object as input, and returns a payload or callback, then registering the context processor in your settings.py, then providing your render_to_response() call with the built-in RequestContext attribute instead of your own (you can always extend RequestContext to add more data on an individual request basis of course).

If that approach for passing data to templates sounded absurd and obfuscated to you, you’re not alone. The complexity involved in such a simple operation is unwarranted and counter-productive, but every system has its shortcomings.

The Issue

If you follow the Django documentation or The Django Book‘s approach to configuring your own custom context-processors, you’ll notice that you are encouraged to add to your settings.py a hardcoded list of built-in context processors. If you follow that approach, your context-processors declaration will look like this:

TEMPLATE_CONTEXT_PROCESSORS = ("django.contrib.auth.context_processors.auth",
"django.core.context_processors.debug",
"django.core.context_processors.i18n",
"django.core.context_processors.media",
"django.contrib.messages.context_processors.messages")

But if you pay close attention to the development version’s documentation, you’ll notice a couple of interesting notes:

Changed in Django Development version: “django.contrib.messages.context_processors.messages” was added to the default. For more information, see the messages documentation.
Changed in Django Development version: The auth context processor was moved in this release from its old location django.core.context_processors.auth to django.contrib.auth.context_processors.auth.

This is a red flag! The Django dev team would like you to use hardcoded values that reference to classes that will no longer exist once the next version is released, and that omit new processors that you are likely to require.

The Solution

Obviously Django has access to its own default settings, so there must be a way to simply extend the defaults instead of overriding them with hardcoded values. You just need to dig around a bit in the Django source code to find exactly how. I’ll save you some digging:

Add this at the top of your settings.py file:

import django.conf.global_settings as DEFAULT_SETTINGS

Then extend the default context processors:

TEMPLATE_CONTEXT_PROCESSORS = DEFAULT_SETTINGS.TEMPLATE_CONTEXT_PROCESSORS + (
"myapp.context_processors.example",
"myapp.context_processors.other_example",
# etc...
)

Why is this a best-practice approach?

In software-engineering, you want to ensure maximum (reasonable) future interoperability of all your components, and there is really no component as important as your actual development framework, Django or otherwise.

Using hardcoded values that are already defined somewhere not only breaks DRY, but it also introduces possible breakage on framework upgrades. I know you are diligent and always read the release notes before upgrading critical components, and I know you use a staging environment to test those changes, but by actively looking out for this kind of traps, you have just saved yourself some debugging time and some head-scratching.

Happy coding!

Penguicon 7.0

Our little gang from the Windsor Unix Users Group just got back from Penguicon 7.0 on Sunday night. As always, it was so much fun we hardly managed to get any sleep, and we all came home with a lot of really fun memories.

I didn’t get around to sit down and come up with a write-up about the experience as i’ve been working on other projects every evening this week, but I finally have a minute to do so now.

As far as the convention event itself went, there is definitely a lot of good and a few notable logistical oversights that are worth mentioning, but i’m not really interested in addressing either on here, as I’m really more interested in sharing my impression of the technical panels I’ve attended.

I have to say, I’m very excited about every single one of the panels I’ve been able to attend this year. They were all highly informative, and the speakers very motivated and passionate about their material. Here’s a short summary of the events I was especially excited about:

  • Neural Networks
    This panel was held by Dr. Stanley C. Mortel. It explained the basic concepts behind the idea of building and training neural computation networks. It was a very abstract fly-by course, which I feel is a very appropriate way to introduce this type of material. There was a second part of this panel available the next day but we weren’t fortunate enough to attend it.
  • Reading by Will Wheaton
    Just kidding ;)
  • Beginning PyGame Programming
    This tutorial by Craig Maloney was my first real introduction to PyGame. Craig had a nice little demo environment all set up and ready for the presentation. He flew pretty quickly through many of the concepts behind PyGame while writing a Pong demo. Although he went through the material pretty quickly, I’m very interested in learning more about the platform as a result, so it’s safe to say the tutorial was a success as far as I’m concerned.
  • Open Hardware with Arduino
    The speaker for this talk was W Craig Trader, and I have to say Craig was not only extremely knowledgeable about the Arduino (and obviously several other) platform, but also pretty excited about it, and since microcontrolers isn’t something I’ve ever bothered to learn anything about, I really didn’t expect to get so excited about the talk. The Arduino platform appears to be very accessible technically and financially, and also pretty powerful. Craig did an amazing job showing us the strengths and weaknesses of the platform and getting our whole group pretty excited to play with it!
  • High Performance PHP
    This talk held by Rasmus Lerdorf, the creator of PHP and an infrastructure architect at Yahoo! Inc., took us through several performance optimization techniques for PHP apps, although many of the concepts featured in the talk could easily be applied to any apache-based app. It was very refreshing to finally see someone as experienced and well-rounded as him go through the tribulations of identifying and addressing performance bottlenecks in PHP apps. I was very interested in both the individual techniques highlighted during the talk, as well as the problem-solving process of a highly experienced software engineer, so that talk was the highlight of the con for me.

Unfortunately, due to very unfortunate logistical shortcomings, our group was unable to attend a lot of the panels we were looking forward to check out, but despite this, the con was a resounding success in every aspect you can think of. I’ve met some really cool and interesting people, learned a lot of very exciting stuff that will be guiding some of my personal research for months to come, learned some technical concepts that will directly impact my work performance, and had way too much fun.

The next stop on our list will probably be PyOhio. We had a chance to chat with Catherine Devlin, a fellow Pythonista & Oracle geek from IntelliTech Systems, who told us about it, and Aaron and I are looking into putting together a talk proposal for the con, if we can come up with it before the deadline, and we already have some pretty interesting ideas, so I’m looking forward to it. I’d also really love to attend PyCon 2010 in Atlanta!

If you’re interested in attending a convention where Linux and FOSS enthusiasts get a chance to have fun with the Sci-Fi crowd for a weekend of fun, PenguiCon is definitely for you!!

Extending PostgreSQL with Python

One of the features I enjoy the most about PostgreSQL is the ability to write stored procedures in C, Perl, TCL, PgSQL, and yes… obviously also in Python. I’ve been using this feature since 7.4, so any recent version of PostgreSQL is pretty much guaranteed to support it, but you’ll need to have the pl/python procedural language contrib module installed. Once it’s installed, you can activate it for your current database using the following query:

CREATE PROCEDURAL LANGUAGE plpythonu;

Once the language bindings have been activated, you can start writing your stored procedures in python, however you should really read up on the following subjects first:

As an example, I’ve written a little SP in PL/Python to provide support for PCRE since the stock distribution of PostgreSQL only supports LIKE/SIMILAR, and POSIX Style Regular Expressions.

Let’s create our Python language binding, and create a standard text storage table:

-- Activate PL/Python
CREATE PROCEDURAL LANGUAGE plpythonu;
 
-- Create a plain text-storage table
CREATE TABLE text_storage
(
  id serial NOT NULL,
  payload CHARACTER VARYING(128),
  CONSTRAINT text_storage_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE text_storage OWNER TO xavier;
 
-- Let's throw in an index on the payload field for good measure
CREATE INDEX txt_payload_idx
  ON text_storage
  USING btree
  (payload);

Let’s now populate our table with some junk data:

INSERT INTO text_storage (payload) VALUES ('hello, world');
INSERT INTO text_storage (payload) VALUES ('the quick brown fox, blah blah blah');
INSERT INTO text_storage (payload) VALUES ('PCREs in Postgres');
INSERT INTO text_storage (payload) VALUES ('All hail Python!');
INSERT INTO text_storage (payload) VALUES ('Hello, test data!');
INSERT INTO text_storage (payload) VALUES ('Python would like to say Hello!');

And now we can go ahead and create our Python SP itself:

CREATE OR REPLACE FUNCTION pcre(text, text)
  RETURNS INTEGER AS
$BODY$import re
 
regex  = args[0]
in_str = args[1]
 
compiled = re.compile(regex)
 
IF compiled.search(in_str):
	RETURN 1
ELSE:
	RETURN 0$BODY$
  LANGUAGE 'plpythonu' VOLATILE
  COST 100;

As you can see, our PCRE matching system is extremely simple, yet pretty powerful. We import Python’s built-in re module, compile the specified regex argument, then attempt to match it against the other argument. Here’s a usage example on our test table:

SELECT id, payload FROM text_storage WHERE pcre('[H|h]ello', payload) = 1;
 id |             payload
----+---------------------------------
  1 | hello, world
  5 | Hello, test DATA!
  6 | Python would LIKE TO say Hello!
(3 ROWS)

As always, feel free to suggest any improvements.

Fixing custom sequences in PostgreSQL

PostgreSQL provides a mechanism called sequences, which I believe is extracted from ANSI-SQL92, though I’m too lazy to check, which are basically stateful counters that provide some helper functions. The primary use of sequences in PostgreSQL and Oracle, is to implement auto-incrementing counters as a table field.

PostgreSQL will automatically create a sequence when you use the “SERIAL” datatype for your field, and will take care of assigning the default value of your field as the result of the nextval() call on the sequence, so most of the time you don’t need to interact directly with sequences.

Where things can get hairy however, is when you back up your data using pg_dump or any other mechanism, and restore that data in a table that is already populated. A common scenario for example, is populating a table that already has some recent data, with some older data you’ve been storing in archival. The opposite is true if you are trying to archive data from a table to a backup database for example.

Whenever you manually have to provide a value for the field assigned to a sequence, you are pretty much guaranteed to break the sequence unless you take the time to nextval() your sequence until it is in sync. This is a real problem, as pg_dump does not include sequence synchronization in its output.

The quickest way to synchronize a sequence, based on my observations, is to run the following query, taking care to replace the name of the sequence [SEQ] and the name of the associated table [TABLE] and field [FIELD]:

SELECT SETVAL([SEQ], COALESCE((SELECT [FIELD] FROM [TABLE] ORDER BY [FIELD] DESC LIMIT 1), 0)+1)

This will fetch the highest value of [FIELD] in [TABLE], increment it by 1, and synchronize the sequence [SEQ] to the new value.

I’ve also written the following little Python script that will look for any non-system sequence in the specified database, and use this method to repair it. Let me know if it works out for you, or if you’d like to suggest some improvements.

Requirements: Python 2.5+, PsycoPG2 Python module (python-psycopg2)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
 
# Standard imports
import sys
import os
import time
from optparse import OptionParser
 
# psycopg2 import
try:
    import psycopg2
except ImportError, e:
    print 'You must install the python module named "psycopg2" in order to use this module.'
    sys.exit(os.EX_SOFTWARE)
 
 
 
class PgRepairman:
    def __init__(self, options, parser):
        self.options = options
        try:
            dsn = "dbname=%s host=%s user=%s" % (options.db, options.host, options.user)
            dsn += ("" != options.passwd) and ("password=%s" % options.password) or ""
            self.conn = psycopg2.connect(dsn)
            self.curs = self.conn.cursor()
        except Exception, e:
            print "ERROR - %s" % e
            sys.exit(1)
 
 
    # Returns a dict for a psycopg2 row object
    def _to_dict(self, desc, res):
        return dict(zip([x[0] for x in desc], res))
 
 
    # Attempt to locate all custom sequences
    def findSequences(self):
        seq_query = """
        SELECT pc1.relname AS seq, pc2.relname AS table, c.attname AS field
        FROM pg_depend, pg_class pc1, pg_class pc2, pg_attribute c
        WHERE pc1.oid = pg_depend.objid
            AND pc2.oid = pg_depend.refobjid
            AND c.attnum = pg_depend.refobjsubid
            AND c.attrelid = pc2.oid
            AND pc1.relkind = 'S'
            AND pc1.relname NOT LIKE 'pg_toast%%'
        """
 
        try:
            self.print_verbose(seq_query)
            self.curs.execute(seq_query)
        except Exception, e:
            print "[ERROR] - %s" % e
            sys.exit(1)
 
        desc = self.curs.description
        for row in self.curs.fetchall():
            yield self._to_dict(desc, row)
 
 
    # Increment the key value to the value of the sequence + 1
    def fixSequences(self):
        for seq in self.findSequences():
            print "Fixing sequence %s in table %s" % (seq['seq'], seq['table'])
            fix_query = "SELECT setval('%s', COALESCE((SELECT %s FROM %s ORDER BY %s DESC LIMIT 1), 0)+1)" % (seq['seq'], seq['field'], seq['table'], seq['field'])
            try:
                self.print_verbose(fix_query)
                self.curs.execute(fix_query)
            except Exception, e:
                print "[WARNING] - %s" % e
                pass
 
    def print_verbose(self, msg):
        if (True == self.options.verbose):
            print "[DEBUG] - %s" % msg
 
if __name__=='__main__':
    usage       = "Usage: %prog <options> [-v --verbose] [-u --username | -p --password \ -o --host | -d --database]"
    version     = "%prog v1.0\nDistributed under the LGPL2 License"
    description = "Increments all sequences in a PostgreSQL database"
    parser = OptionParser(usage=usage, version=version, description=description)
    parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, help="Enable extra output")
    parser.add_option("-o", "--host", action="store", dest="host", default="127.0.0.1", help="Database hostname/IP")
    parser.add_option("-u", "--username", action="store", dest="user", default="postgres", help="Database Username")
    parser.add_option("-p", "--password", action="store", dest="passwd", default="", help="Database Password")
    parser.add_option("-d", "--database", action="store", dest="db", default="template1", help="Database Name")
 
 
    try:
        (options, args) = parser.parse_args()
        obj = PgRepairman(options, parser)
        obj.fixSequences()
    except KeyboardInterrupt, e:
        sys.exit(1)

You can also download the script here.

Apache2 shortcuts

I’ve been maintaining and managing Apache servers for over 10 years now, but for some reason, I never bothered to RTFM when it came to enabling/disabling modules and site configs in apache2.. As it turns out, you don’t have to manually create symlinks from mods_available to mods_enabled and sites_available to sites_enabled, as Apache2 includes a handful of shortcut scripts to do this work for you… Doh!

To enable a module in your apache2 config, instead of doing the old

ln -sf /etc/apache2/mods_available/mod_rewrite.conf /etc/apache2/mods_enabled/mod_rewrite.conf
ln -sf /etc/apache2/mods_available/mod_rewrite.load /etc/apache2/mods_enabled/mod_rewrite.load

Next time, just do:

a2enmod rewrite && /etc/init.d/apache2 restart

To disable this module, try

a2dismod rewrite && /etc/init.d/apache2 restart

Similarly, to enable a site config, try

a2ensite myVhost.com && /etc/init.d/apache2 restart

and to disable it, obviously, try

a2dissite myVhost.com && /etc/init.d/apache2 restart

Finally, if you’d like to lint through your Apache2 config files before issuing a restart which might be responsible for some downtime if it doesn’t work, try

apache2ctl configtest