Cluster Analysis

Multivariate Data Analysis

Description for Cluster Analysis

Yeongeun Jeon
12-01-2022

1. 계층적 군집분석


1-1. 단백질 섭취량 데이터

# 데이터 불러오기
protein   <- read.csv("C:/Users/User/Desktop/protein.csv")
head(protein)
         country   x1   x2  x3   x4  x5   x6  x7  x8  x9
1        Albania 10.1  1.4 0.5  8.9 0.2 42.3 0.6 5.5 1.7
2        Austria  8.9 14.0 4.3 19.9 2.1 28.0 3.6 1.3 4.3
3        Belgium 13.5  9.3 4.1 17.5 4.5 26.6 5.7 2.1 4.0
4       Bulgaria  7.8  6.0 1.6  8.3 1.2 56.7 1.1 3.7 4.2
5 Czechoslovakia  9.7 11.4 2.8 12.5 2.0 34.3 5.0 1.1 4.0
6        Denmark 10.6 10.8 3.7 25.0 9.9 21.9 4.8 0.7 2.4
# 데이터 표준화NAprotein.Z <- scale(protein[,-1],                            # 수치형 데이터만                   center = TRUE, scale = TRUE)
head(protein.Z)
              x1         x2         x3         x4          x5
[1,]  0.08126490 -1.7584889 -2.1796385 -1.0101406 -1.20028213
[2,] -0.27725673  1.6523731  1.2204544  0.4889887 -0.64187467
[3,]  1.09707621  0.3800675  1.0415022  0.1619060  0.06348211
[4,] -0.60590157 -0.5132535 -1.1954011 -1.0919112 -0.90638347
[5,] -0.03824231  0.9485445 -0.1216875 -0.5195164 -0.67126454
[6,]  0.23064892  0.7861225  0.6835976  1.1840396  1.65053488
             x6         x7         x8          x9
[1,]  0.9159176 -2.2495772  1.2227536 -1.35040507
[2,] -0.3870690 -0.4136872 -0.8923886  0.09091397
[3,] -0.5146342  0.8714358 -0.4895043 -0.07539207
[4,]  2.2280161 -1.9435955  0.3162641  0.03547862
[5,]  0.1869740  0.4430614 -0.9931096 -0.07539207
[6,] -0.9428885  0.3206688 -1.1945517 -0.96235764

1-2. 거리행렬 계산

# 유클리드 거리거리
protein.Z.eucl <- dist(protein.Z,                            # 데이터행렬NA= "euclidean")                 # 거리 계산 방법 ("euclidean" / "maximum" / "manhattan" / "canberra" / "binary" / "minkowski")  
protein.Z.eucl
          1        2        3        4        5        6        7
2  6.123875                                                      
3  5.941088 2.449873                                             
4  2.764456 4.883310 5.227110                                    
5  5.139593 2.114980 2.213299 3.947607                           
6  6.610018 3.013919 2.525413 6.008027 3.340491                  
7  6.391783 2.563414 2.102111 5.408239 1.879623 2.721124         
8  5.814582 4.042713 3.457791 5.748819 3.913776 2.615699 3.994256
9  6.296012 3.588910 2.193291 5.546748 3.360107 3.657722 3.781843
10 4.244946 5.163303 4.695152 3.748492 4.866844 5.590841 5.614962
11 4.673363 3.266150 3.985273 3.345019 2.749570 5.010346 3.675953
12 6.730999 2.732974 1.630907 6.182109 3.122919 2.829417 2.989315
13 4.022035 3.711167 3.716300 2.859180 3.345898 4.762895 4.319464
14 5.986448 1.116565 2.239399 5.141308 2.160153 2.535976 2.494697
15 5.441777 3.873663 2.953678 5.250463 3.506576 1.992772 3.244186
16 5.871454 2.795918 2.935223 4.417656 2.090846 3.839478 2.693562
17 6.610517 6.507879 5.633919 6.003440 5.512520 5.827369 5.248118
18 2.688487 4.640216 4.755043 1.886874 3.561895 5.512335 4.784174
19 5.568338 4.871951 3.985465 4.841928 4.146923 5.079414 4.086358
20 5.229438 3.529912 2.949649 4.903076 2.965132 3.092112 2.542194
21 5.096917 2.198411 2.333797 4.449617 2.593398 3.187969 3.543233
22 5.926153 3.747707 1.942979 5.779938 3.820355 3.471573 3.913917
23 4.336891 4.160985 3.160461 3.819771 2.712791 4.151407 3.411440
24 6.345178 1.643941 1.417223 5.598788 2.172661 2.382298 1.872404
25 2.942274 5.433204 5.596744 1.992518 4.339328 6.338870 5.524645
          8        9       10       11       12       13       14
2                                                                
3                                                                
4                                                                
5                                                                
6                                                                
7                                                                
8                                                                
9  4.567955                                                      
10 5.474533 4.544557                                             
11 5.328549 4.962783 4.100647                                    
12 3.224120 3.143931 5.697292 4.784389                           
13 4.864586 3.796687 2.145756 3.150281 4.825175                  
14 3.365080 3.405382 5.152024 3.457789 2.342517 3.905212         
15 2.030075 3.918233 4.623274 4.884784 3.608666 3.985673 3.363363
16 4.097647 3.598806 4.413737 3.023794 3.730426 3.111997 2.769121
17 6.428604 5.632730 4.762834 5.695401 7.025402 4.651763 6.336455
18 5.004229 5.518264 3.612688 2.470722 5.580929 3.108077 4.622070
19 5.409697 4.433606 3.082462 3.880044 5.248204 2.868401 4.838410
20 4.275745 4.254842 5.190977 4.215675 4.049561 4.026921 3.497706
21 3.520186 2.420056 4.101138 3.821847 2.815059 2.915464 1.901042
22 3.855547 2.570958 4.620733 5.104963 2.246291 4.178481 3.515752
23 3.417115 4.235938 4.114133 3.421339 3.884740 3.558105 3.874407
24 3.615881 2.935469 5.363661 3.889344 1.790710 4.133378 1.262664
25 5.732427 6.296222 3.920355 3.030617 6.436280 3.577969 5.481283
         15       16       17       18       19       20       21
2                                                                
3                                                                
4                                                                
5                                                                
6                                                                
7                                                                
8                                                                
9                                                                
10                                                               
11                                                               
12                                                               
13                                                               
14                                                               
15                                                               
16 3.704275                                                      
17 4.752560 4.788692                                             
18 4.663587 3.943882 5.625717                                    
19 4.129411 3.377436 2.929892 4.241617                           
20 2.940822 4.259157 5.165496 4.550891 4.274958                  
21 3.337777 3.069445 6.086275 4.336272 4.548833 3.740528         
22 3.548623 4.499316 6.514669 5.413308 4.695146 3.765195 2.839136
23 3.251411 2.915581 5.058469 2.749723 3.616797 3.942925 3.786530
24 3.295288 2.996946 6.122876 5.083416 4.589144 3.016613 2.278316
25 5.386559 4.477836 5.823758 0.984629 4.566993 5.325980 5.185324
         22       23       24
2                            
3                            
4                            
5                            
6                            
7                            
8                            
9                            
10                           
11                           
12                           
13                           
14                           
15                           
16                           
17                           
18                           
19                           
20                           
21                           
22                           
23 4.003052                  
24 2.894140 3.894353         
25 6.254361 3.345415 5.954889
# 맨해탄protein.Z.manh <- dist(protein.Z,                            # 데이터행렬NA= "manhattan")                 # 거리 계산 방법 ("euclidean" / "maximum" / "manhattan" / "canberra" / "binary" / "minkowski")  
protein.Z.manh
           1         2         3         4         5         6
2  15.922351                                                  
3  16.350155  5.839909                                        
4   7.202762 12.164617 13.531205                              
5  12.816592  5.020699  5.916742  9.758628                    
6  17.836634  7.544084  6.810607 16.791615  7.602617          
7  16.604002  6.229443  5.189203 13.212725  4.612490  6.092691
8  14.765137  9.873982  8.618508 14.733239  8.196667  6.501158
9  18.313829  8.352994  5.072344 15.273139  8.102521  8.490269
10 10.234851 12.736623 12.053828 10.631360 11.412357 14.464065
11 10.415047  8.148005 10.703383  8.241299  5.972804 13.151683
12 17.836108  6.650779  4.057464 16.236736  7.739682  6.247824
13  9.867503  9.479881 10.133450  7.261115  7.689824 13.393860
14 15.896055  2.970283  5.816381 13.051196  5.096264  5.256358
15 14.550311  8.901140  8.015220 13.458234  7.974591  4.037210
16 15.834284  7.177214  6.634021 11.419261  5.080847  9.917353
17 15.461775 17.265159 14.579423 14.030417 14.463409 15.767261
18  6.826927 12.039094 13.183941  4.541820  9.411363 15.113902
19 13.368531 12.779743 10.276243 12.092373 10.296167 13.168693
20 13.400057  8.512587  7.976709 12.515473  7.743967  6.138239
21 14.428725  5.569403  5.251812 11.388034  6.556989  7.673516
22 15.996251  8.508467  5.022115 14.115818  9.964549  8.690987
23  9.799826 10.315444  8.122013  8.243488  6.911514 11.264379
24 17.057128  4.077456  3.442081 14.459920  5.104177  5.536873
25  7.143340 14.147774 15.292620  4.781488 11.520043 17.666065
           7         8         9        10        11        12
2                                                             
3                                                             
4                                                             
5                                                             
6                                                             
7                                                             
8   8.566655                                                  
9   8.691877 10.274383                                        
10 14.026336 12.466149 10.513122                              
11  9.139667 14.049963 12.120472 10.551519                    
12  6.944160  8.635796  8.020812 16.084036 13.029929          
13 10.882170 11.206123  9.325358  4.663894  7.941069 13.544337
14  5.655149  8.118024  8.022447 13.140432  8.545013  5.602520
15  8.198139  4.277074  9.303916 11.083836 12.774958  8.294788
16  7.227209  9.765707  6.414157  9.859335  7.924294  8.594674
17 13.487139 15.156129 14.367967 13.016800 13.733944 18.392102
18 12.421977 13.217948 15.147615 10.158255  5.812218 14.669893
19  9.428744 11.568758  9.864327  7.478483  9.220485 14.333707
20  6.512512  8.082638 10.168448 12.902075 11.248535  9.837177
21  8.870830  9.335571  5.621022 10.400459  9.595799  5.737717
22  9.403363  9.281473  5.815233 12.566322 13.074638  5.609090
23  8.540213  8.243278 11.187222  9.271846  9.271751  9.466240
24  4.940684  8.958721  7.223278 13.601863  9.953737  4.920484
25 14.530657 15.066282 17.256295 11.037830  6.665679 17.111185
          13        14        15        16        17        18
2                                                             
3                                                             
4                                                             
5                                                             
6                                                             
7                                                             
8                                                             
9                                                             
10                                                            
11                                                            
12                                                            
13                                                            
14 10.182459                                                  
15  9.931118  6.328651                                        
16  6.670675  7.205189  9.102665                              
17 11.284597 17.110898 12.092396 11.564056                    
18  7.917614 12.371319 11.942943 10.755954 12.688288          
19  6.938403 12.807718 10.092078  7.940059  6.075174 10.780881
20 10.589590  7.511142  6.244290 11.285165 13.321417 11.600953
21  7.860761  4.699414  8.129039  7.054014 16.502698 11.629689
22 10.555641  7.900291  7.824142 11.047952 16.343100 13.154880
23  7.264408  9.439180  7.471955  6.883328 12.195762  6.839067
24 11.062165  3.368959  7.854776  7.232372 16.766678 13.890914
25  9.254213 14.479999 13.791277 12.864634 13.120172  2.552163
          19        20        21        22        23        24
2                                                             
3                                                             
4                                                             
5                                                             
6                                                             
7                                                             
8                                                             
9                                                             
10                                                            
11                                                            
12                                                            
13                                                            
14                                                            
15                                                            
16                                                            
17                                                            
18                                                            
19                                                            
20 10.320641                                                  
21 12.199518  9.285703                                        
22 12.039920  9.767549  7.603077                              
23  9.525971 10.596879 10.161429  8.992036                    
24 12.463498  7.432058  6.226637  6.739449  9.440950          
25 11.346738 13.717005 13.615976 14.938716  8.576531 15.999594
# 캔버라
protein.Z.canb <- dist(protein.Z,                            # 데이터행렬NA= "canberra")                  # 거리 계산 방법 ("euclidean" / "maximum" / "manhattan" / "canberra" / "binary" / "minkowski")  
protein.Z.canb
          1        2        3        4        5        6        7
2  7.992465                                                      
3  8.756315 5.640642                                             
4  4.097492 6.630577 9.000000                                    
5  7.052835 6.103984 6.093431 7.045875                           
6  7.646712 5.615350 4.922763 9.000000 6.200671                  
7  7.813571 5.148328 4.699116 7.385347 5.193110 4.107882         
8  7.250089 6.380377 6.182104 7.646164 5.702034 4.692191 5.081166
9  8.935576 5.475654 4.069506 8.947282 6.924267 4.411923 5.229742
10 4.820310 7.850672 7.620604 6.621562 7.643255 7.644528 8.183016
11 5.044541 5.961247 8.524712 4.480982 5.776739 8.215983 6.736353
12 7.525821 4.502426 3.576339 8.193498 5.695532 3.367774 4.389830
13 5.453198 6.886389 9.000000 4.769752 6.320890 9.000000 7.598268
14 8.047823 3.644800 5.520583 7.941872 5.540688 4.421877 4.419827
15 7.422105 6.153173 6.535210 7.606514 6.162217 3.747267 5.707535
16 7.790878 5.442702 5.788293 6.970611 4.655775 6.492961 6.092152
17 5.765297 7.614430 7.059690 5.402925 7.676428 7.115959 5.947790
18 3.262863 7.063774 8.815217 2.837638 6.697269 8.130208 6.862477
19 6.277890 7.340544 6.904660 6.168673 7.574673 7.001559 5.219693
20 6.567728 6.023964 6.713872 7.780027 7.468314 4.632041 5.016757
21 7.619598 4.343510 5.022295 7.427893 6.789720 5.252149 6.513873
22 7.676327 6.520133 5.410403 7.387121 7.981489 5.130516 6.284954
23 5.486912 7.385634 7.608800 5.178479 6.597335 7.708295 6.878156
24 8.106687 4.409031 3.573845 8.554435 5.235543 3.622233 4.421962
25 3.017328 7.270993 8.746269 3.022156 6.912193 8.299401 6.988237
          8        9       10       11       12       13       14
2                                                                
3                                                                
4                                                                
5                                                                
6                                                                
7                                                                
8                                                                
9  5.635275                                                      
10 6.403565 6.403371                                             
11 8.600204 8.331424 6.726583                                    
12 5.403195 4.789879 8.593534 7.676314                           
13 7.202350 8.040584 4.856052 5.062828 8.404313                  
14 5.632047 5.867276 8.621823 6.932061 3.776233 7.702362         
15 2.287436 5.948602 6.708027 8.562574 4.995486 7.120521 4.777874
16 6.150103 4.864196 6.108024 6.076695 5.045689 5.582084 5.878306
17 5.897328 6.607723 6.374984 6.481707 7.306843 5.747319 8.138892
18 7.200569 9.000000 6.266177 4.294859 7.262427 4.932853 7.516646
19 5.858377 5.437241 5.676930 6.380918 7.439558 5.744754 7.916500
20 5.890865 6.522268 7.533951 7.626476 6.025559 8.147739 5.773479
21 6.963751 4.000337 7.182729 7.208337 3.584962 7.116084 4.531666
22 5.733905 5.240434 7.675720 8.753012 4.528467 7.698598 6.120411
23 5.627060 8.438522 6.511577 7.156365 6.228017 5.411218 6.796995
24 6.118255 4.928829 7.935081 7.647791 3.212485 8.000000 3.594684
25 7.153849 9.000000 6.064470 4.310174 7.415514 5.071859 7.485514
         15       16       17       18       19       20       21
2                                                                
3                                                                
4                                                                
5                                                                
6                                                                
7                                                                
8                                                                
9                                                                
10                                                               
11                                                               
12                                                               
13                                                               
14                                                               
15                                                               
16 5.970481                                                      
17 5.933230 6.087708                                             
18 6.876623 6.898976 5.099701                                    
19 6.363319 6.209516 2.642192 5.929742                           
20 5.615244 8.218659 6.890916 6.839814 6.407828                  
21 6.571237 5.421934 7.780231 7.359895 6.972123 6.686104         
22 4.892389 7.764607 6.764836 7.131779 6.940741 7.072300 6.587213
23 5.410688 5.705017 6.038315 4.786588 6.819116 8.210300 8.140124
24 5.801445 5.072589 7.719264 8.173902 7.600842 6.052709 5.087199
25 6.874512 7.262822 4.897441 1.226072 5.646118 6.964953 7.372604
         22       23       24
2                            
3                            
4                            
5                            
6                            
7                            
8                            
9                            
10                           
11                           
12                           
13                           
14                           
15                           
16                           
17                           
18                           
19                           
20                           
21                           
22                           
23 5.934657                  
24 5.188208 6.943229         
25 6.971982 4.987902 8.084658
# 민코우스키
protein.Z.mink <- dist(protein.Z,                            # 데이터행렬NA= "minkowski")                 # 거리 계산 방법 ("euclidean" / "maximum" / "manhattan" / "canberra" / "binary" / "minkowski")  
protein.Z.mink
          1        2        3        4        5        6        7
2  6.123875                                                      
3  5.941088 2.449873                                             
4  2.764456 4.883310 5.227110                                    
5  5.139593 2.114980 2.213299 3.947607                           
6  6.610018 3.013919 2.525413 6.008027 3.340491                  
7  6.391783 2.563414 2.102111 5.408239 1.879623 2.721124         
8  5.814582 4.042713 3.457791 5.748819 3.913776 2.615699 3.994256
9  6.296012 3.588910 2.193291 5.546748 3.360107 3.657722 3.781843
10 4.244946 5.163303 4.695152 3.748492 4.866844 5.590841 5.614962
11 4.673363 3.266150 3.985273 3.345019 2.749570 5.010346 3.675953
12 6.730999 2.732974 1.630907 6.182109 3.122919 2.829417 2.989315
13 4.022035 3.711167 3.716300 2.859180 3.345898 4.762895 4.319464
14 5.986448 1.116565 2.239399 5.141308 2.160153 2.535976 2.494697
15 5.441777 3.873663 2.953678 5.250463 3.506576 1.992772 3.244186
16 5.871454 2.795918 2.935223 4.417656 2.090846 3.839478 2.693562
17 6.610517 6.507879 5.633919 6.003440 5.512520 5.827369 5.248118
18 2.688487 4.640216 4.755043 1.886874 3.561895 5.512335 4.784174
19 5.568338 4.871951 3.985465 4.841928 4.146923 5.079414 4.086358
20 5.229438 3.529912 2.949649 4.903076 2.965132 3.092112 2.542194
21 5.096917 2.198411 2.333797 4.449617 2.593398 3.187969 3.543233
22 5.926153 3.747707 1.942979 5.779938 3.820355 3.471573 3.913917
23 4.336891 4.160985 3.160461 3.819771 2.712791 4.151407 3.411440
24 6.345178 1.643941 1.417223 5.598788 2.172661 2.382298 1.872404
25 2.942274 5.433204 5.596744 1.992518 4.339328 6.338870 5.524645
          8        9       10       11       12       13       14
2                                                                
3                                                                
4                                                                
5                                                                
6                                                                
7                                                                
8                                                                
9  4.567955                                                      
10 5.474533 4.544557                                             
11 5.328549 4.962783 4.100647                                    
12 3.224120 3.143931 5.697292 4.784389                           
13 4.864586 3.796687 2.145756 3.150281 4.825175                  
14 3.365080 3.405382 5.152024 3.457789 2.342517 3.905212         
15 2.030075 3.918233 4.623274 4.884784 3.608666 3.985673 3.363363
16 4.097647 3.598806 4.413737 3.023794 3.730426 3.111997 2.769121
17 6.428604 5.632730 4.762834 5.695401 7.025402 4.651763 6.336455
18 5.004229 5.518264 3.612688 2.470722 5.580929 3.108077 4.622070
19 5.409697 4.433606 3.082462 3.880044 5.248204 2.868401 4.838410
20 4.275745 4.254842 5.190977 4.215675 4.049561 4.026921 3.497706
21 3.520186 2.420056 4.101138 3.821847 2.815059 2.915464 1.901042
22 3.855547 2.570958 4.620733 5.104963 2.246291 4.178481 3.515752
23 3.417115 4.235938 4.114133 3.421339 3.884740 3.558105 3.874407
24 3.615881 2.935469 5.363661 3.889344 1.790710 4.133378 1.262664
25 5.732427 6.296222 3.920355 3.030617 6.436280 3.577969 5.481283
         15       16       17       18       19       20       21
2                                                                
3                                                                
4                                                                
5                                                                
6                                                                
7                                                                
8                                                                
9                                                                
10                                                               
11                                                               
12                                                               
13                                                               
14                                                               
15                                                               
16 3.704275                                                      
17 4.752560 4.788692                                             
18 4.663587 3.943882 5.625717                                    
19 4.129411 3.377436 2.929892 4.241617                           
20 2.940822 4.259157 5.165496 4.550891 4.274958                  
21 3.337777 3.069445 6.086275 4.336272 4.548833 3.740528         
22 3.548623 4.499316 6.514669 5.413308 4.695146 3.765195 2.839136
23 3.251411 2.915581 5.058469 2.749723 3.616797 3.942925 3.786530
24 3.295288 2.996946 6.122876 5.083416 4.589144 3.016613 2.278316
25 5.386559 4.477836 5.823758 0.984629 4.566993 5.325980 5.185324
         22       23       24
2                            
3                            
4                            
5                            
6                            
7                            
8                            
9                            
10                           
11                           
12                           
13                           
14                           
15                           
16                           
17                           
18                           
19                           
20                           
21                           
22                           
23 4.003052                  
24 2.894140 3.894353         
25 6.254361 3.345415 5.954889

1-3. 계층적 군집분석

# Ward 방법
protein.Z.ward <- hclust(protein.Z.eucl,                     # 거리행렬
                           method = "ward.D")                # 군집분석 방법 ("ward.D" / "ward.D2" / "single" / "complete" / "average" / "mcquitty" / "median" / "centroid"))

# 최단 연결법
protein.Z.sing <- hclust(protein.Z.eucl,                     # 거리행렬
                         method = "single")                  # 군집분석 방법 ("ward.D" / "ward.D2" / "single" / "complete" / "average" / "mcquitty" / "median" / "centroid"))

# 최장 연결법protein.Z.comp <- hclust(protein.Z.eucl,                     # 거리행렬
                         method = "complete")                # 군집분석 방법 ("ward.D" / "ward.D2" / "single" / "complete" / "average" / "mcquitty" / "median" / "centroid"))

# 평균 연결법
protein.Z.aver <- hclust(protein.Z.eucl,                     # 거리행렬
                         method = "average")                 # 군집분석 방법 ("ward.D" / "ward.D2" / "single" / "complete" / "average" / "mcquitty" / "median" / "centroid"))

1-4. 덴드로그램

# 덴드로그램(나무 그림)무 그림)
plot(protein.Z.ward,                                         # 군집분석을 저장한 객체     labels = protein$country,                               # 라벨벨
     main = "Ward")
plot(protein.Z.comp,                                         # 군집분석을 저장한 객체     labels = protein$country,                               # 라벨벨
     main = "최장 연결법")
plot(protein.Z.sing,                                         # 군집분석을 저장한 객체     labels = protein$country,                               # 라벨벨
     main = "최단 연결법")
plot(protein.Z.aver,                                         # 군집분석을 저장한 객체     labels = protein$country,                               # 라벨벨
     main = "평균 연결법")

Result! 군집분석 방법에 따라 군집을 형성한 결과가 다르다.


Caution! 덴드로그램으로 계층적 군집방법을 시각화할 때, “phylo” 객체일 경우 시각화의 옵션을 다양하게 변경할 수 있다. “phylo” 객체에 대한 함수 plot()의 자세한 옵션은 여기를 참고한다.

# phylo 객체에 대한 함수 plot()
pacman::p_load("ape")                                        # For as.phylo()

  There is a binary version available but the source version
  is later:
    binary source needs_compilation
ape    5.5  5.6-2              TRUE
# 개체별 군집번호번호
hcluster <- cutree(protein.Z.ward,                           # 군집분석을 저장한 객체                   k = 5)                                    # 군집 개수col <- c("red", "blue", "green", "black", "cyan")

plot(as.phylo(protein.Z.ward),                               # phylo 객체  
     type = "fan",                                           # 그래프 타입 ("phylogram" / "fan" / "cladogram" / "unrooted" / "radial")     tip.color = col[hcluster],                              # 라벨 색상상
     label.offset = 0.4,                                     # 라벨의 떨어짐 정도도
     cex = 1)                                                # 라벨 크기


Caution! 함수 rect.hclust()를 이용하여 덴드로그램에 군집의 구분을 표현할 수 있다. 이때 옵션 k를 통해 분류할 군집의 개수를 지정한다.

# 군집 구분 표현
plot(protein.Z.ward,                                         # 군집분석을 저장한 객체     labels = protein$country,                               # 라벨벨
     main = "Ward")

rect.hclust(protein.Z.ward,                                  # 군집분석을 저장한 객체            k = 5,                                           # 군집 개수수
            border = "red")                                  # 군집 구분 상자 색깔


1-5. 원자료와 군집번호 결합

Caution! 함수 cutree()를 군집번호로 구성된 벡터를 생성하는 기능을 가지고 있으며, 이때 군집의 개수를 옵션 k에 지정한다.

# 개체별 군집번호번호
hcluster <- cutree(protein.Z.ward,                           # 군집분석을 저장한 객체                   k = 5)                                    # 군집 개수# 원자료와 군집번호 결합NAprotein.X.hclust <- data.frame(protein, hcluster)
protein.X.hclust
          country   x1   x2  x3   x4   x5   x6  x7  x8  x9 hcluster
1         Albania 10.1  1.4 0.5  8.9  0.2 42.3 0.6 5.5 1.7        1
2         Austria  8.9 14.0 4.3 19.9  2.1 28.0 3.6 1.3 4.3        2
3         Belgium 13.5  9.3 4.1 17.5  4.5 26.6 5.7 2.1 4.0        2
4        Bulgaria  7.8  6.0 1.6  8.3  1.2 56.7 1.1 3.7 4.2        1
5  Czechoslovakia  9.7 11.4 2.8 12.5  2.0 34.3 5.0 1.1 4.0        3
6         Denmark 10.6 10.8 3.7 25.0  9.9 21.9 4.8 0.7 2.4        4
7       E Germany  8.4 11.6 3.7 11.1  5.4 24.6 6.5 0.8 3.6        3
8         Finland  9.5  4.9 2.7 33.7  5.8 26.3 5.1 1.0 1.4        4
9          France 18.0  9.9 3.3 19.5  5.7 28.1 4.8 2.4 6.5        2
10         Greece 10.2  3.0 2.8 17.6  5.9 41.7 2.2 7.8 6.5        5
11        Hungary  5.3 12.4 2.9  9.7  0.3 40.1 4.0 5.4 4.2        3
12        Ireland 13.9 10.0 4.7 25.8  2.2 24.0 6.2 1.6 2.9        2
13          Italy  9.0  5.1 2.9 13.7  3.4 36.8 2.1 4.3 6.7        5
14    Netherlands  9.5 13.6 3.6 23.4  2.5 22.4 4.2 1.8 3.7        2
15         Norway  9.4  4.7 2.7 23.3  9.7 23.0 4.6 1.6 2.7        4
16         Poland  6.9 10.2 2.7 19.3  3.0 36.1 5.9 2.0 6.6        3
17       Portugal  6.2  3.7 1.1  4.9 14.2 27.0 5.9 4.7 7.9        5
18        Romania  6.2  6.3 1.5 11.1  1.0 49.6 3.1 5.3 2.8        1
19          Spain  7.1  3.4 3.1  8.6  7.0 29.2 5.7 5.9 7.2        5
20         Sweden  9.9  7.8 3.5  4.7  7.5 19.5 3.7 1.4 2.0        4
21    Switzerland 13.1 10.1 3.1 23.8  2.3 25.6 2.8 2.4 4.9        2
22             UK 17.4  5.7 4.7 20.6  4.3 24.3 4.7 3.4 3.3        2
23           USSR  9.3  4.6 2.1 16.6  3.0 43.6 6.4 3.4 2.9        3
24      W Germany 11.4 12.5 4.1 18.8  3.4 18.6 5.2 1.5 3.8        2
25     Yugoslavia  4.4  5.0 1.2  9.5  0.6 55.9 3.0 5.7 3.2        1

Result! “Ward” 방법을 이용했을 때 1번 군집에 속하는 개체는 “Albania”, “Bulgaria”, “Romania”, “Yugoslavia”이다.


# 군집별 개체 수table(protein.X.hclust$hcluster)

1 2 3 4 5 
4 8 5 4 4 

Result! “Ward” 방법을 이용했을 때 2번 집단에 속하는 개체들이 8개로 가장 많다.


# 군집별 표본 평균aggregate(protein.Z,                                          # 분석 대상상
          by = list(hcluster),                                # 집단 변수
          FUN = mean)                                         # 집단에 적용할 함수NA
  Group.1           x1         x2          x3         x4         x5
1       1 -0.807569986 -0.8719354 -1.55330561 -0.9351841 -1.0386379
2       2  1.011180399  0.7421332  0.94084150  0.6610479 -0.2671539
3       3 -0.570049402  0.5803879 -0.08589708 -0.3368952 -0.4537795
4       4  0.006572897 -0.2290150  0.19147892  0.7308937  1.1582546
5       5 -0.508801956 -1.1088009 -0.41248496 -0.6966863  0.9819154
          x6         x7         x8          x9
1  1.7200335 -1.4234267  0.9961313 -0.64360439
2 -0.6877583  0.2288743 -0.5083895  0.02161979
3  0.3181839  0.7857609 -0.2679180  0.06873983
4 -0.8722721  0.1676780 -0.9553392 -1.11480485
5  0.1300253 -0.1842010  1.3108846  1.62924487

Result! “Ward” 방법을 이용했을 때 1번 군집은 \(x_6\)(곡류)와 \(x_8\)(견과류)의 평균이 매우 크고, 그 외의 다른 변수들의 평균은 상대적으로 작음을 알 수 있다.


1-6. 거리행렬을 이용하는 경우

# 거리행렬
exam71 <- c(0, 1, 7, 9, 1, 0, 3, 6, 7, 3, 0, 5, 9, 6, 5, 0)   
exam71.matrix <- matrix(exam71, nrow = 4)
exam71.matrix
     [,1] [,2] [,3] [,4]
[1,]    0    1    7    9
[2,]    1    0    3    6
[3,]    7    3    0    5
[4,]    9    6    5    0
# dist 객체로 변환변환
exam71.dist <- as.dist(exam71.matrix)    
exam71.dist
  1 2 3
2 1    
3 7 3  
4 9 6 5
# 최단 연결법
exam71.sing <- hclust(exam71.dist,                           # 거리행렬
                      method = "single")                     # 최단 연결법

plot(exam71.sing, xlab = "")                                 # 덴드로그램그램
# 최장 연결법exam71.comp <- hclust(exam71.dist,                           # 거리행렬
                      method = "complete")                   # 최장 연결법
plot(exam71.comp, xlab = "")                                 # 덴드로그램그램


2. 비계층적 군집분석

# Bivariate 데이터bivariate <- read.csv("C:/Users/User/Desktop/bivariate.csv")
bivariate
   x1  x2
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
# K 군집
bivariate.kmeans <- kmeans(bivariate,                         # 데이터행렬NA= 2)                       # 군집 개수bivariate.kmeans
K-means clustering with 2 clusters of sizes 2, 5

Cluster means:
    x1  x2
1 1.25 1.5
2 3.90 5.1

Clustering vector:
[1] 1 1 2 2 2 2 2

Within cluster sum of squares by cluster:
[1] 0.625 7.900
 (between_SS / total_SS =  77.0 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"      
bivariate.kmeans$iter                                         # 반복 수
[1] 1
bivariate.kmeans$size                                         # 형성된 각 굽집의 크기(개체 수)
[1] 2 5
bivariate.kmeans$centers                                      # 각 군집별 평균벡터
    x1  x2
1 1.25 1.5
2 3.90 5.1
# 원자료와 군집번호 결합NAbivariate.kclust <- data.frame(bivariate,
                               bivariate.kmeans$cluster)      # 개체별 군집번호번호
bivariate.kclust
   x1  x2 bivariate.kmeans.cluster
1 1.0 1.0                        1
2 1.5 2.0                        1
3 3.0 4.0                        2
4 5.0 7.0                        2
5 3.5 5.0                        2
6 4.5 5.0                        2
7 3.5 4.5                        2

# Protein 데이터protein.Z <- scale(protein[,-1], center = TRUE, scale = TRUE)
protein.Z.kmeans <- kmeans(protein.Z,                         # 데이터행렬NA= 5,                       # 군집 개수수
                           nstart = 30)                       # 군집 초기값의 집합을 주어진 수만큼 추출하여 분석NAprotein.Z.kmeans
K-means clustering with 5 clusters of sizes 4, 4, 8, 4, 5

Cluster means:
            x1         x2          x3         x4         x5
1 -0.807569986 -0.8719354 -1.55330561 -0.9351841 -1.0386379
2 -0.508801956 -1.1088009 -0.41248496 -0.6966863  0.9819154
3  1.011180399  0.7421332  0.94084150  0.6610479 -0.2671539
4  0.006572897 -0.2290150  0.19147892  0.7308937  1.1582546
5 -0.570049402  0.5803879 -0.08589708 -0.3368952 -0.4537795
          x6         x7         x8          x9
1  1.7200335 -1.4234267  0.9961313 -0.64360439
2  0.1300253 -0.1842010  1.3108846  1.62924487
3 -0.6877583  0.2288743 -0.5083895  0.02161979
4 -0.8722721  0.1676780 -0.9553392 -1.11480485
5  0.3181839  0.7857609 -0.2679180  0.06873983

Clustering vector:
 [1] 1 3 3 1 5 4 5 4 3 2 5 3 2 3 4 5 2 1 2 4 3 3 5 3 1

Within cluster sum of squares by cluster:
[1]  8.006767 18.810330 22.039942 12.856453 16.915865
 (between_SS / total_SS =  63.6 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"      
protein.Z.kmeans$iter                                         # 반복 수
[1] 3
protein.Z.kmeans$size                                         # 형성된 각 굽집의 크기(개체 수)
[1] 4 4 8 4 5
protein.Z.kmeans$centers                                      # 각 군집별 평균벡터
            x1         x2          x3         x4         x5
1 -0.807569986 -0.8719354 -1.55330561 -0.9351841 -1.0386379
2 -0.508801956 -1.1088009 -0.41248496 -0.6966863  0.9819154
3  1.011180399  0.7421332  0.94084150  0.6610479 -0.2671539
4  0.006572897 -0.2290150  0.19147892  0.7308937  1.1582546
5 -0.570049402  0.5803879 -0.08589708 -0.3368952 -0.4537795
          x6         x7         x8          x9
1  1.7200335 -1.4234267  0.9961313 -0.64360439
2  0.1300253 -0.1842010  1.3108846  1.62924487
3 -0.6877583  0.2288743 -0.5083895  0.02161979
4 -0.8722721  0.1676780 -0.9553392 -1.11480485
5  0.3181839  0.7857609 -0.2679180  0.06873983

# 원자료와 군집번호 결합NAprotein.X.kclust <- data.frame(protein, 
                               protein.Z.kmeans$cluster)      # 개체별 군집번호번호
protein.X.kclust
          country   x1   x2  x3   x4   x5   x6  x7  x8  x9
1         Albania 10.1  1.4 0.5  8.9  0.2 42.3 0.6 5.5 1.7
2         Austria  8.9 14.0 4.3 19.9  2.1 28.0 3.6 1.3 4.3
3         Belgium 13.5  9.3 4.1 17.5  4.5 26.6 5.7 2.1 4.0
4        Bulgaria  7.8  6.0 1.6  8.3  1.2 56.7 1.1 3.7 4.2
5  Czechoslovakia  9.7 11.4 2.8 12.5  2.0 34.3 5.0 1.1 4.0
6         Denmark 10.6 10.8 3.7 25.0  9.9 21.9 4.8 0.7 2.4
7       E Germany  8.4 11.6 3.7 11.1  5.4 24.6 6.5 0.8 3.6
8         Finland  9.5  4.9 2.7 33.7  5.8 26.3 5.1 1.0 1.4
9          France 18.0  9.9 3.3 19.5  5.7 28.1 4.8 2.4 6.5
10         Greece 10.2  3.0 2.8 17.6  5.9 41.7 2.2 7.8 6.5
11        Hungary  5.3 12.4 2.9  9.7  0.3 40.1 4.0 5.4 4.2
12        Ireland 13.9 10.0 4.7 25.8  2.2 24.0 6.2 1.6 2.9
13          Italy  9.0  5.1 2.9 13.7  3.4 36.8 2.1 4.3 6.7
14    Netherlands  9.5 13.6 3.6 23.4  2.5 22.4 4.2 1.8 3.7
15         Norway  9.4  4.7 2.7 23.3  9.7 23.0 4.6 1.6 2.7
16         Poland  6.9 10.2 2.7 19.3  3.0 36.1 5.9 2.0 6.6
17       Portugal  6.2  3.7 1.1  4.9 14.2 27.0 5.9 4.7 7.9
18        Romania  6.2  6.3 1.5 11.1  1.0 49.6 3.1 5.3 2.8
19          Spain  7.1  3.4 3.1  8.6  7.0 29.2 5.7 5.9 7.2
20         Sweden  9.9  7.8 3.5  4.7  7.5 19.5 3.7 1.4 2.0
21    Switzerland 13.1 10.1 3.1 23.8  2.3 25.6 2.8 2.4 4.9
22             UK 17.4  5.7 4.7 20.6  4.3 24.3 4.7 3.4 3.3
23           USSR  9.3  4.6 2.1 16.6  3.0 43.6 6.4 3.4 2.9
24      W Germany 11.4 12.5 4.1 18.8  3.4 18.6 5.2 1.5 3.8
25     Yugoslavia  4.4  5.0 1.2  9.5  0.6 55.9 3.0 5.7 3.2
   protein.Z.kmeans.cluster
1                         1
2                         3
3                         3
4                         1
5                         5
6                         4
7                         5
8                         4
9                         3
10                        2
11                        5
12                        3
13                        2
14                        3
15                        4
16                        5
17                        2
18                        1
19                        2
20                        4
21                        3
22                        3
23                        5
24                        3
25                        1

Caution! Package cluster에서 제공하는 함수 clusplot()을 이용하여 군집화의 결과를 시각적으로 표현할 수 있다. 함수 clusplot()은 분석 변수들에 대한 주성분점수를 이용하여 각 개체들의 위치와 군집화 결과를 이차원 그래프로 표현하며, 주성분점수를 이용한 그래프적 표현은 군집의 개수를 탐색하는 좋은 방법 중 하나이다. 함수 clusplot()의 자세한 옵션은 여기를 참고한다.

# 시각화pacman::p_load("cluster")

clusplot(protein.Z,                                           # 데이터행렬NAprotein.Z.kmeans$cluster,                            # 개체별 군집번호번호
         labels = 4,                                          # 0~5 : 군집번호 또는 개체번호를 표현할 것인지의 여부NA= 1,                                           # 0~2 : 군집들 간의 거리를 선으로 표현할 것인지의 여부         color = TRUE,                                        # 색깔을 넣을 건지의 여부         shade = TRUE,                                        # 음영을 넣을 건지의 여부
         cex = 1.5)                                           # 텍스트 크기


3. 군집의 개수


3-1. 함수 fviz_nbclust()

pacman::p_load("factoextra")

# Method = "wss"
fviz_nbclust(protein.Z,                                     # 데이터 행렬NAkmeans,                                        # 군집분석 방법법
             method = "wss",                                # 탐색할 통계량 ("wss" : 군집내 제곱합, "silhouette" : 실루엣-계수, "gap_stat" : 갭 통계량))
             k.max = 10)                                    # 탐색할 최대 군집 개수


# Method = "silhouette"
fviz_nbclust(protein.Z,                                     # 데이터 행렬NAkmeans,                                        # 군집분석 방법법
             method = "silhouette",                         # 탐색할 통계량 ("wss" : 군집내 제곱합, "silhouette" : 실루엣-계수, "gap_stat" : 갭 통계량))
             k.max = 10)                                    # 탐색할 최대 군집 개수


# Method = "gap_stat"

fviz_nbclust(protein.Z,                                     # 데이터 행렬NAkmeans,                                        # 군집분석 방법법
             method = "gap_stat",                           # 탐색할 통계량 ("wss" : 군집내 제곱합, "silhouette" : 실루엣-계수, "gap_stat" : 갭 통계량))
             nboot = 500)                                   # 붓스트랩 반복 횟수 횟수


3-2. 함수 NbClust()

pacman::p_load("NbClust")

nc <- NbClust(data = protein.Z,                             # 데이터 행렬NA= "euclidean",                       # 거리측도도
              min.nc = 2,                                   # 탐색할 최소 군집 개수              max.nc = 15,                                  # 탐색할 최대 군집 개수              method = "kmeans")                            # 군집분석 방법

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot. 
 

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 7 proposed 2 as the best number of clusters 
* 8 proposed 3 as the best number of clusters 
* 1 proposed 12 as the best number of clusters 
* 3 proposed 13 as the best number of clusters 
* 4 proposed 15 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  3 
 
 
******************************************************************* 
nc
$All.index
       KL      CH Hartigan     CCC    Scott      Marriot   TrCovW
2  2.7022 13.1029   5.8597 -0.0363  57.5560 5.479829e+09 371.9784
3  1.8645 10.6656   3.4545 -0.3530 113.7661 1.301607e+09 238.2300
4  1.0091  8.9522   3.1070 -0.9185 141.0133 7.780796e+08 186.4731
5  0.5691  8.0802   4.8647 -1.3944 169.4849 3.892618e+08 134.4208
6  1.5388  8.5590   3.4964 -0.8121 215.1825 9.010649e+07  88.4805
7  2.2213  8.5526   1.8289 -0.7517 240.6801 4.422940e+07  59.8435
8  0.5052  7.8737   3.2297 -1.3656 270.3849 1.760636e+07  51.6444
9  2.0289  8.0961   1.7740 -1.1075 315.1312 3.720938e+06  38.1517
10 1.1953  7.6796   1.4965 -1.6437 345.3094 1.373784e+06  32.2035
11 0.4254  7.2341   3.4134 -2.3123 395.6683 2.217576e+05  29.1912
12 1.5636  7.8837   2.4001 -1.8041 476.7473 1.030313e+04  20.1290
13 2.0728  8.0871  -0.6939 -1.8944 481.6411 9.942145e+03  11.9889
14 0.6266  6.3983   1.6477 -4.2741 462.0155 2.528026e+04  14.0784
15 1.0939  6.3172   1.5401 -4.8571 561.1597 5.500434e+02  11.7617
     TraceW  Friedman  Rubin Cindex     DB Silhouette   Duda Pseudot2
2  137.6066   14.6729 1.5697 0.3996 1.3200     0.3113 0.6053   5.2170
3  109.6667   26.0708 1.9696 0.5364 1.1832     0.3209 0.8391   1.1505
4   94.7834   29.4278 2.2789 0.5138 1.2967     0.2099 3.0892  -8.7918
5   82.5674   36.8610 2.6160 0.5306 1.1257     0.2248 1.7441  -2.1331
6   66.4133   46.7657 3.2524 0.4689 1.1311     0.2429 1.2886  -1.3438
7   56.0914   50.3262 3.8509 0.5531 1.1520     0.2305 2.8744  -1.3042
8   50.9179   65.3608 4.2421 0.5398 1.0512     0.2513 2.2956  -2.2575
9   42.7888   81.7524 5.0480 0.5104 0.9396     0.2623 3.0903  -0.6764
10  38.5181   97.9467 5.6077 0.5487 0.8251     0.3233 6.4088  -1.6879
11  35.0240  168.1959 6.1672 0.5353 0.7933     0.3333 2.1953  -1.0890
12  28.1585  311.6303 7.6709 0.4795 0.6781     0.4291 2.1522  -1.0707
13  23.7700  264.1365 9.0871 0.5451 0.6836     0.4197 6.2336  -0.8396
14  25.2288  218.6919 8.5616 0.5403 0.6627     0.4309 1.3704  -0.2703
15  21.9420 1028.8406 9.8441 0.5125 0.6093     0.4697 3.0394   0.0000
     Beale Ratkowsky    Ball Ptbiserial    Frey McClain   Dunn Hubert
2   3.4809    0.3960 68.8033     0.6461  0.2920  0.6642 0.4104 0.0097
3   0.9212    0.3983 36.5556     0.6991  2.6183  0.8658 0.5696 0.0105
4  -2.7074    0.3713 23.6959     0.6098  0.3566  1.3576 0.3946 0.0109
5  -1.7079    0.3493 16.5135     0.6121  0.5468  1.4292 0.4115 0.0112
6  -1.0087    0.3390 11.0689     0.5511  0.5272  2.7592 0.4379 0.0122
7  -2.6105    0.3249  8.0131     0.5066  0.5037  3.7251 0.4015 0.0127
8  -1.6946    0.3088  6.3647     0.4939  0.3876  4.0402 0.4015 0.0133
9   0.0000    0.2983  4.7543     0.4530 -0.3578  5.3257 0.4142 0.0137
10 -2.5340    0.2865  3.8518     0.4864  1.7337  4.7681 0.4508 0.0134
11     Inf    0.2758  3.1840     0.4247 -0.0791  6.4376 0.4508 0.0140
12 -1.6074    0.2691  2.3465     0.4515  0.3127  6.0618 0.4508 0.0142
13 -2.5208    0.2616  1.8285     0.4135 -0.1781  7.6341 0.5262 0.0145
14     Inf    0.2511  1.8021     0.3641  0.2286  9.3390 0.3361 0.0144
15  0.0000    0.2447  1.4628     0.3468  0.1666 10.5971 0.3361 0.0144
   SDindex Dindex   SDbw
2   1.8251 2.2514 0.7620
3   1.5339 2.0047 0.6361
4   1.5574 1.8726 0.4970
5   1.3346 1.7401 0.4047
6   1.3741 1.5670 0.4351
7   1.3308 1.4576 0.3494
8   1.2549 1.3529 0.2899
9   1.2246 1.2376 0.2611
10  1.1211 1.1380 0.2163
11  1.2088 1.0665 0.1983
12  1.0583 0.8969 0.1168
13  1.0830 0.8299 0.1125
14  1.8762 0.8334 0.1280
15  1.8573 0.7551 0.1128

$All.CriticalValues
   CritValue_Duda CritValue_PseudoT2 Fvalue_Beale
2          0.4742             8.8696       0.0013
3          0.3418            11.5536       0.5184
4          0.2098            48.9687       1.0000
5          0.2098            18.8341       1.0000
6          0.2857            14.9980       1.0000
7          0.2098             7.5336       1.0000
8          0.0985            36.6176       1.0000
9         -0.0882           -12.3333          NaN
10         0.0985            18.3088       1.0000
11        -0.5097            -5.9239          NaN
12         0.0985            18.3088       1.0000
13         0.0985             9.1544       1.0000
14        -0.5097            -2.9619          NaN
15        -0.0882             0.0000          NaN

$Best.nc
                    KL      CH Hartigan     CCC   Scott    Marriot
Number_clusters 2.0000  2.0000   13.000  2.0000 15.0000          3
Value_Index     2.7022 13.1029    3.094 -0.0363 99.1442 3654694541
                  TrCovW  TraceW Friedman   Rubin Cindex      DB
Number_clusters   3.0000  3.0000  15.0000 13.0000 2.0000 15.0000
Value_Index     133.7483 13.0567 810.1487 -1.9417 0.3996  0.6093
                Silhouette   Duda PseudoT2  Beale Ratkowsky    Ball
Number_clusters    15.0000 2.0000    2.000 3.0000    3.0000  3.0000
Value_Index         0.4697 0.6053    5.217 0.9212    0.3983 32.2478
                PtBiserial Frey McClain   Dunn Hubert SDindex Dindex
Number_clusters     3.0000    1  2.0000 3.0000      0 12.0000      0
Value_Index         0.6991   NA  0.6642 0.5696      0  1.0583      0
                   SDbw
Number_clusters 13.0000
Value_Index      0.1125

$Best.partition
 [1] 1 3 3 1 3 3 3 3 3 2 1 3 2 3 3 3 2 1 2 3 3 3 1 3 1

Result! 7개의 통계량에서 군집의 개수로 2를 추천하고 있으며, 8개의 통계량에서 군집의 개수를 3으로 추천하고 있다. “$All.index”를 통해 각 통계량에 대하여 군집의 개수에 따른 측정값을 볼 수 있으며, “$Best.nc”에서는 각 통계량에 대해 최적의 군집 개수를 출력하고 있다. “$Best.partition”에서 최적의 군집 개수에 대한 개체별 군집번호를 출력하고 있으며, 이 예제에서는 가장 많은 통계량이 최적의 군집 개수로 3을 추천하고 있기 때문에, 총 3군집으로 나누어졌다.


# 막대그래프barplot(table(nc$Best.n[1,]),                               # 군집 개수에 대한 도수분포표표
        xlab = "Number of Clusters",
        ylab = "Number of Criteria")


Caution! 옵션 index에 특정 통계량의 이름을 지정하면, 해당 통계량에 대한 결과를 자세히 볼 수 있다. 아래의 예는 가장 널리 사용되는 통계량 중 하나인 Cubic Clustering Criterion (CCC) 통계량의 결과를 보여주고 있다.

# 특정 통계량에 대한 최적의 군집 개수NANbClust(data = protein.Z,                                   # 데이터 행렬NA= "euclidean",                             # 거리측도도
        min.nc = 2,                                         # 탐색할 최소 군집 개수        max.nc = 15,                                        # 탐색할 최대 군집 개수        method = "kmeans",                                  # 군집분석 방법법
        index = "ccc")                                      # 탐색할 통계량
$All.index
      2       3       4       5       6       7       8       9 
-0.0363 -0.3530 -0.9185 -1.3944 -0.8121 -0.7517 -1.3656 -1.1075 
     10      11      12      13      14      15 
-1.6437 -2.3123 -1.8041 -1.8944 -4.2741 -4.8571 

$Best.nc
Number_clusters     Value_Index 
         2.0000         -0.0363 

$Best.partition
 [1] 1 2 2 1 2 2 2 2 2 1 1 2 1 2 2 2 1 1 1 2 2 2 1 2 1

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".