Sunday, November 27, 2005

Decision tree building and graphing in Python

Spent part of yesterday throwing together an implementation of the ID3 algorithm for building decision trees. It uses pydot for creating graphs, so the trees can be outputted in dot format (ascii, for display in Graphviz) or as an image.

The id3 module can be downloaded here. The implementation isn't bullet-proof, but should work fine with good data sets.


Example usage and output:

>>> from id3 import *
>>> dtree = DecisionTree('recycling_bin',
... [{'dept':'EE', 'size':'large', 'recycling_bin':'no'},
... {'dept':'CS', 'size':'medium', 'recycling_bin':'yes'},
... {'dept':'EE', 'size':'small', 'recycling_bin':'yes'},
... {'dept':'CS', 'size':'large', 'recycling_bin':'no'},
... {'dept':'EE', 'size':'small', 'recycling_bin':'yes'},
... {'dept':'CS', 'size':'medium', 'recycling_bin':'yes'}])
>>>
>>> dtree.graph.to_string()
'digraph G {\n"no";\n"yes";\n"yes";\n"size";\n"size" -> "no" [label=large];\n"size" -> "yes" [label=small];\n"size" -> "yes" [label=medium];\n}\n'