The cluster command is used for hierarchical agglomerative clustering of custom data by using a custom distance function.
taxicab:=(p1,p2)->l1norm(p1-p2) |
We apply cluster command to an “aggregation” shape dataset1. The dataset is loaded from file in a table cell in Xcas and associated with the variable data. We use the average linkage method and silhouette index (which is used by default if index=true). By setting output parameter to plot, we obtain a visualization of colored clusters as shown in Figure 20.1.
For string data, Levenshtein distance is used by default (see Section 4.2.14).
cluster(["cat","mouse","rat","spouse","house","cut"],output=part) |
|
In the following example we cluster genomic sequences into five clusters by using average linkage and Hamming distance function.
data:=["GTCTT","AAGCT","GGTAA","AGGCT","GTCAT","CGGCC", "GGGAG","GTTAT","GTCAT","AGGCT","GTCAG","AGGAT"]:; cluster(data,type="average",count=5,distance=hamdist,output=part) |
|