Previous Up Next

20.1.1  Hierarchical clustering

The cluster command is used for hierarchical agglomerative clustering of custom data by using a custom distance function.

Examples

We apply cluster command to an “aggregation” shape dataset1. The dataset is loaded from file in a table cell in Xcas and associated with the variable data. We use the average linkage method and silhouette index (which is used by default if index=true). By setting output parameter to plot, we obtain a visualization of colored clusters as shown in Figure 20.1.


Figure 20.1: Clustering in Xcas

For string data, Levenshtein distance is used by default (see Section 4.2.14).

cluster(["cat","mouse","rat","spouse","house","cut"],output=part)
     


    “cat”“rat”“cut”
    “mouse”“spouse”“house”


          

In the following example we cluster genomic sequences into five clusters by using average linkage and Hamming distance function.

data:=["GTCTT","AAGCT","GGTAA","AGGCT","GTCAT","CGGCC", "GGGAG","GTTAT","GTCAT","AGGCT","GTCAG","AGGAT"]:; cluster(data,type="average",count=5,distance=hamdist,output=part)
     
[

“GTCTT”,“GTCAT”,“GTTAT”,“GTCAT”,“GTCAG”
,
         
 

“AAGCT”,“AGGCT”,“AGGCT”,“AGGAT”
,
         
 

“GGTAA”
,
“CGGCC”
,
“GGGAG”
]
         

Previous Up Next