Simplifying networks
Orange provides an implementation of a procedure called the Pathfinder for simplifying (large) networks.
Assuming that a weight of an edge represents a distance (interpreted as a dissimilarity measure), the pruning idea of
the Pathfinder algorithm is based on the triangle inequality, which states that the direct distance between two points must
be less than or equal to the distance between those two points going through an intermediate point. The triangle inequality can be easily
extended to all paths: the direct distance between two nodes must be less than or equal to the dist-length (sum of all weights)
of every path between these two nodes; therefore also less than or equal to the length of the geodesic path (i.e. the shortest path).
The algorithm eliminates the links which violate the extended triangle inequality and thus simplifying the network and clarifying it for the subsequent analysis.
For further information regarding the implemented procedure (with some experiments) consult the following document [Vavpetic 2010].
Pathfinder
The Pathfinder class offers a way to simplify a given network with the specified parameters.
Methods
- Pathfinder()
- Constructs a Pathfinder object.
- simplify(r, q, graph)
-
Simplifies the given graph by removing the edges which violate the extended triangle inequality. See the parameter meanings bellow. The speed of the procedure depends heavily on the
parameter
q
and the graph's properties (it works best with sparse graphs). The most commonly used values are r = sys.maxint
and q = n-1
, where n
equals the number of nodes
in the graph.
- r
-
This parameter affects the way in which the cost of a path is calculated - it is actually the parameter to the Minkowski formula (consult the paper mentioned above for further information).
For example, if we have two edges with weights a and b, then for
r = 1
the calculated cost c
would be c = a + b
,
for r = 2
, c = sqrt(a**2 + b**2)
and for r = sys.maxint
(which is used to represent infinity) it converges to c = max(a, b)
.
- q
-
This parameter represents the maximum length (i.e. the number of edges) of all alternative paths checked between two nodes when calculating the lowest cost between them.
- setProgressCallback(fun)
- Sets a progress callback function, which is called for every node when it's complete. The function is expected to accept one argument -
a double value between 0 and 1 is passed to it.
Examples
Simplifying a small network
This example shows how to simplify a weighted network using the Pathfinder procedure. As the procedure interprets the weights of a given network as dissimilarities, it only makes sense to apply the procedure to undirected graphs.
import orngNetwork
from orangeom import Pathfinder
from pylab import *
...
# Read a demo network from a file
net = orngNetwork.Network.read('demo.net')
# Compute a layout for plotting
netOp = orngNetwork.NetworkOptimization(net)
netOp.fruchtermanReingold(100, 1000)
# Plot the original
myPlot(net, 'Original network')
# Choose some parameters
r, q = 1, 6
# Create a pathfinder instance
pf = Pathfinder()
# Simplify the network
pf.simplify(r, q, net)
# Plot the simplified network
myPlot(net, 'Simplified network')
show()
Executing the script above pops-up two pylab windows with the
following two networks:
Progress callback functionality
This example shows how to use a progress callback function.
import orngNetwork
from orangeom import Pathfinder
def myCb(complete):
"""
The callback function.
"""
print 'The procedure is %d%% complete.' % int(complete * 100)
# Read a demo network from a file
net = orngNetwork.Network.read('demo.net')
# Choose some parameters
r, q = 1, 6
# Create a pathfinder instance
pf = Pathfinder()
# Pass the reference to the desired function
pf.setProgressCallback(myCb)
# Simplify the network
pf.simplify(r, q, net)
Executing the script above should print something like this:
The procedure is 14% complete.
The procedure is 28% complete.
The procedure is 42% complete.
The procedure is 57% complete.
The procedure is 71% complete.
The procedure is 85% complete.
The procedure is 100% complete.