The SDL Component Suite is an industry leading collection of components supporting scientific and engineering computing. Please visit the SDL Web site for more information.... |
Home MathPack Math2 Procedures and Functions AgglomClustering | |||||||||
See also: FindCenters, TFeedbackProc, ClusterMethod, ClustResult, ClustDist | |||||||||
AgglomClustering |
|||||||||
The function AgglomClustering performs an agglomerative hierarchical cluster analysis on data contained in matrix InMat. Each data object is represented as one row of the matrix, the columns are forming the variables. The parameter DistanceMeasure specifies the type of distance measurement. AgglomClustering may be aborted at any time by setting the global variable AbortMathProc to TRUE. In this case the function returns -1 as function result (otherwise a zero value is returned). The parameter ClusterMethod specifies the type of clustering method used. If cmFlexLink is used as clustering method, the parameter alpha has to be additionally specified. Alpha may take any value between 0.5 and 1.0. A value of 0.5 results in an average linkage clustering (cmAvgLink). Higher values increase the divisive effects of the clustering process. Usually a value between 0.6 and 0.7 is preferred. The result of the clustering process is returned in the parameters ClustResult, ClustDist, and DendroCoords. The integer array ClustResult contains the clustering information, describing which clusters (or objects) are joined to form a new cluster. This matrix consists of InMat.NrOfRows-1 rows and three columns. The rows are ordered by increasing cluster distance, which is stored in the parameter ClustDist. The parameter Sender contains the object which called AgglomClustering; it is used by the callback routine specified by the parameter Feedback. For simple applications (and small data sets) these two parameters may be set to NIL. The parameter OnDistCalc can be used to pass an event routine to the subroutine Matrix.CalcDist which is called internally in order to calculate the object distances. The OnDistCalc event is triggered only if the parameter DistanceMeasure is set to dmUserDef. An example should clarify the situation. The results of the cluster analysis shown below have been obtained from a set of 20 observations (objects) with four variables by applying Ward's algorithm (ClusterMethod = cmWard) to it.
------------- ClustResult -------------- ClustDist number of number of number of cluster 1 cluster 2 new cluster distance ----------------------------------------------------------- 2 19 21 5.0945 1 16 22 5.3573 3 6 23 7.2815 9 10 24 10.2774 8 14 25 10.6847 12 18 26 13.0239 4 25 27 13.5628 24 15 28 16.0441 5 13 29 16.5704 7 17 30 19.2583 23 27 31 24.1079 11 29 32 24.2236 26 20 33 24.6635 22 21 34 26.9456 31 34 35 39.2175 32 28 36 52.7880 36 30 37 90.4147 35 33 38 109.4378 37 38 39 315.1660 ----------------------------------------------------------- The table above is to interpret as follows: clusters (objects) 2 and 19 are joined to form the new cluster 21; the distance between the two original clusters is 5.09. Next, clusters 1 and 16 are joined to form cluster 22 at a distance of 7.28, and so on. Note that any cluster numbers below or equal to InMat.NrOfRows designate the original objects, whereas higher numbers designate clusters built up of other objects and/or clusters. The results of a cluster analysis are normally displayed as a dendrogram:
In order to facilitate the drawing of a dendrogram, the parameter DendroCoords (a vector of 2*InMat.NrOfRows -1 elements) contains the coordinates of the lines of the corresponding dendrogram. The first InMat.NrOfRows coordinates are those of the objects, the rest refer to the clusters as numbered in the matrix ClustResult (see the example program CLUSTER on details how to use the array DendroCoords).
|