Simplifying networks

Orange provides an implementation of a procedure called the Pathfinder for simplifying (large) networks.

Assuming that a weight of an edge represents a distance (interpreted as a dissimilarity measure), the pruning idea of the Pathfinder algorithm is based on the triangle inequality, which states that the direct distance between two points must be less than or equal to the distance between those two points going through an intermediate point. The triangle inequality can be easily extended to all paths: the direct distance between two nodes must be less than or equal to the dist-length (sum of all weights) of every path between these two nodes; therefore also less than or equal to the length of the geodesic path (i.e. the shortest path). The algorithm eliminates the links which violate the extended triangle inequality and thus simplifying the network and clarifying it for the subsequent analysis.

For further information regarding the implemented procedure (with some experiments) consult the following document [Vavpetic 2010].

Pathfinder

The Pathfinder class offers a way to simplify a given network with the specified parameters.

Methods

Pathfinder()
Constructs a Pathfinder object.
simplify(r, q, graph)
Simplifies the given graph by removing the edges which violate the extended triangle inequality. See the parameter meanings bellow. The speed of the procedure depends heavily on the parameter q and the graph's properties (it works best with sparse graphs). The most commonly used values are r = sys.maxint and q = n-1, where n equals the number of nodes in the graph.
r
This parameter affects the way in which the cost of a path is calculated - it is actually the parameter to the Minkowski formula (consult the paper mentioned above for further information).
For example, if we have two edges with weights a and b, then for r = 1 the calculated cost c would be c = a + b, for r = 2, c = sqrt(a**2 + b**2) and for r = sys.maxint (which is used to represent infinity) it converges to c = max(a, b).
q
This parameter represents the maximum length (i.e. the number of edges) of all alternative paths checked between two nodes when calculating the lowest cost between them.
setProgressCallback(fun)
Sets a progress callback function, which is called for every node when it's complete. The function is expected to accept one argument - a double value between 0 and 1 is passed to it.

Examples

Simplifying a small network

This example shows how to simplify a weighted network using the Pathfinder procedure. As the procedure interprets the weights of a given network as dissimilarities, it only makes sense to apply the procedure to undirected graphs.

Part of pathfinder.py (uses demo.net)

import orngNetwork from orangeom import Pathfinder from pylab import * ... # Read a demo network from a file net = orngNetwork.Network.read('demo.net') # Compute a layout for plotting netOp = orngNetwork.NetworkOptimization(net) netOp.fruchtermanReingold(100, 1000) # Plot the original myPlot(net, 'Original network') # Choose some parameters r, q = 1, 6 # Create a pathfinder instance pf = Pathfinder() # Simplify the network pf.simplify(r, q, net) # Plot the simplified network myPlot(net, 'Simplified network') show()

Executing the script above pops-up two pylab windows with the following two networks:

Progress callback functionality

This example shows how to use a progress callback function.

Part of pf_progress.py (uses demo.net)

import orngNetwork from orangeom import Pathfinder def myCb(complete): """ The callback function. """ print 'The procedure is %d%% complete.' % int(complete * 100) # Read a demo network from a file net = orngNetwork.Network.read('demo.net') # Choose some parameters r, q = 1, 6 # Create a pathfinder instance pf = Pathfinder() # Pass the reference to the desired function pf.setProgressCallback(myCb) # Simplify the network pf.simplify(r, q, net)

Executing the script above should print something like this:

The procedure is 14% complete. The procedure is 28% complete. The procedure is 42% complete. The procedure is 57% complete. The procedure is 71% complete. The procedure is 85% complete. The procedure is 100% complete.