Decision Tree

Machine Learning

R code using Various Packages for Decision Tree

Yeongeun Jeon , Jeongwook Lee , Jung In Seo
09-25-2020

“rpart”와 “C5.0” 방법으로 의사결정나무모형을 실습하기 위해서 사용될 예제 데이터는 “Universal Bank_Main”로 유니버셜 은행의 고객들에 대한 데이터(출처 : Data Mining for Business Intelligence, Shmueli et al. 2010)이다. 데이터는 총 2500개이며, 변수의 갯수는 13개이다. 여기서 TargetPerson.Loan이다.



1. 데이터 불러오기

pacman::p_load("data.table",
               "dplyr")


UB   <- fread(paste(getwd(),"Universal Bank_Main.csv", sep="/")) %>%   # 데이터 불러오기
    data.frame() %>%                                                   # Data frame 변환select(-1)                                                         # ID변수 제거# select columns
cols <- c("Family", "Education", "Personal.Loan", "Securities.Account", 
          "CD.Account", "Online", "CreditCard")
UB   <- UB %>% 
    mutate_at(cols, as.factor)                                         # 범주형 변수 변환
    
glimpse(UB)                                                            # 데이터 구조
Rows: 2,500
Columns: 13
$ Age                <int> 25, 45, 39, 35, 35, 37, 53, 50, 35, 34, 6~
$ Experience         <int> 1, 19, 15, 9, 8, 13, 27, 24, 10, 9, 39, 5~
$ Income             <int> 49, 34, 11, 100, 45, 29, 72, 22, 81, 180,~
$ ZIP.Code           <int> 91107, 90089, 94720, 94112, 91330, 92121,~
$ Family             <fct> 4, 3, 1, 1, 4, 4, 2, 1, 3, 1, 4, 3, 2, 4,~
$ CCAvg              <dbl> 1.6, 1.5, 1.0, 2.7, 1.0, 0.4, 1.5, 0.3, 0~
$ Education          <fct> 1, 1, 1, 2, 2, 2, 2, 3, 2, 3, 3, 2, 3, 2,~
$ Mortgage           <int> 0, 0, 0, 0, 0, 155, 0, 0, 104, 0, 0, 0, 0~
$ Personal.Loan      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,~
$ Securities.Account <fct> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,~
$ CD.Account         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ Online             <fct> 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,~
$ CreditCard         <fct> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,~

2. 데이터 분할

# Partition (Traning Data : Test Data = 7:3)

pacman::p_load("caret")

y      <- UB$Personal.Loan                            # Target
 
set.seed(200)
ind    <- createDataPartition(y, p=0.7, list=T)       # Training Data를 70%로 추출
UB.trd <- UB[ind$Resample1,]                          # Traning Data

UB.ted <- UB[-ind$Resample1,]                         # Test Data

detach(package:caret)

3. R Package “rpart”

의사결정나무모형을 형성하기 위해 사용될 첫번째 Package는 "rpart"이다. "rpart"는 약간 수정된 CART를 사용하며, CP (Complexity Parameter)를 중심으로 분석한다. 게다가 "rpart"는 Cross Validation을 이용하여 최적의 CP값을 선택할 수 있도록하며, defalut값은 10-fold Cross Validation이다. 또한 가독성 좋은 그래프가 있기 때문에 트리를 시각화하기에 좋다.

rpart(formula, data, method, ...)

3-1. 모형 적합

pacman::p_load("rpart",                   # for Decision tree
               "rattle", "rpart.plot")    # for fancyRpartPlot(가독성 좋은 그래프) ) 

set.seed(200)                             # seed 고정 for cross-validation   
rContol      <- rpart.control(xval=15)    # xval : Number of cross validation
UB.trd.rtree <- rpart(Personal.Loan~., data=UB.trd,                 
                      method="class", control = rContol)         

summary(UB.trd.rtree)
Call:
rpart(formula = Personal.Loan ~ ., data = UB.trd, method = "class", 
    control = rContol)
  n= 1751 

          CP nsplit rel error    xerror       xstd
1 0.32500000      0 1.0000000 1.0000000 0.07060066
2 0.15000000      2 0.3500000 0.4111111 0.04676994
3 0.01944444      3 0.2000000 0.1944444 0.03253697
4 0.01111111      5 0.1611111 0.2166667 0.03430588
5 0.01000000      7 0.1388889 0.2055556 0.03343418

Variable importance
    Income  Education     Family      CCAvg CD.Account   Mortgage 
        29         28         20         13          5          3 
  ZIP.Code        Age Experience 
         1          1          1 

Node number 1: 1751 observations,    complexity param=0.325
  predicted class=0  expected loss=0.1027984  P(node) =1
    class counts:  1571   180
   probabilities: 0.897 0.103 
  left son=2 (1366 obs) right son=3 (385 obs)
  Primary splits:
      Income     < 106.5   to the left,  improve=94.968130, (0 missing)
      CCAvg      < 2.95    to the left,  improve=69.046700, (0 missing)
      CD.Account splits as  LR,          improve=41.226300, (0 missing)
      Mortgage   < 293.5   to the left,  improve=14.467750, (0 missing)
      Education  splits as  LRR,         improve= 7.681315, (0 missing)
  Surrogate splits:
      CCAvg      < 3.25    to the left,  agree=0.868, adj=0.400, (0 split)
      Mortgage   < 339.5   to the left,  agree=0.802, adj=0.101, (0 split)
      CD.Account splits as  LR,          agree=0.786, adj=0.026, (0 split)

Node number 2: 1366 observations,    complexity param=0.01111111
  predicted class=0  expected loss=0.01537335  P(node) =0.7801256
    class counts:  1345    21
   probabilities: 0.985 0.015 
  left son=4 (1277 obs) right son=5 (89 obs)
  Primary splits:
      CCAvg      < 2.95    to the left,  improve=9.2644320, (0 missing)
      Income     < 98.5    to the left,  improve=3.5382100, (0 missing)
      CD.Account splits as  LR,          improve=1.1082890, (0 missing)
      Mortgage   < 220.5   to the left,  improve=0.6886484, (0 missing)
      Experience < 38.5    to the left,  improve=0.2000360, (0 missing)

Node number 3: 385 observations,    complexity param=0.325
  predicted class=0  expected loss=0.412987  P(node) =0.2198744
    class counts:   226   159
   probabilities: 0.587 0.413 
  left son=6 (242 obs) right son=7 (143 obs)
  Primary splits:
      Education  splits as  LRR,         improve=111.984200, (0 missing)
      Family     splits as  LLRR,        improve= 73.753990, (0 missing)
      CD.Account splits as  LR,          improve= 27.998710, (0 missing)
      Income     < 156.5   to the left,  improve= 11.193100, (0 missing)
      CCAvg      < 6.635   to the right, improve=  4.353303, (0 missing)
  Surrogate splits:
      Family     splits as  LLRR,        agree=0.743, adj=0.308, (0 split)
      CD.Account splits as  LR,          agree=0.683, adj=0.147, (0 split)
      Income     < 173.5   to the left,  agree=0.642, adj=0.035, (0 split)
      CCAvg      < 8.85    to the left,  agree=0.634, adj=0.014, (0 split)
      ZIP.Code   < 90021.5 to the right, agree=0.631, adj=0.007, (0 split)

Node number 4: 1277 observations
  predicted class=0  expected loss=0  P(node) =0.7292975
    class counts:  1277     0
   probabilities: 1.000 0.000 

Node number 5: 89 observations,    complexity param=0.01111111
  predicted class=0  expected loss=0.2359551  P(node) =0.0508281
    class counts:    68    21
   probabilities: 0.764 0.236 
  left son=10 (73 obs) right son=11 (16 obs)
  Primary splits:
      Income     < 98.5    to the left,  improve=5.904956, (0 missing)
      CD.Account splits as  LR,          improve=4.645443, (0 missing)
      CCAvg      < 3.15    to the right, improve=3.739411, (0 missing)
      Experience < 31.5    to the left,  improve=2.313744, (0 missing)
      Mortgage   < 86.5    to the left,  improve=2.146298, (0 missing)
  Surrogate splits:
      Age        < 62.5    to the left,  agree=0.843, adj=0.125, (0 split)
      Experience < 38.5    to the left,  agree=0.843, adj=0.125, (0 split)
      ZIP.Code   < 90061.5 to the right, agree=0.843, adj=0.125, (0 split)

Node number 6: 242 observations,    complexity param=0.15
  predicted class=0  expected loss=0.1198347  P(node) =0.1382067
    class counts:   213    29
   probabilities: 0.880 0.120 
  left son=12 (211 obs) right son=13 (31 obs)
  Primary splits:
      Family     splits as  LLRR,        improve=47.3076500, (0 missing)
      CD.Account splits as  LR,          improve= 8.0685060, (0 missing)
      Mortgage   < 279.5   to the left,  improve= 1.5428930, (0 missing)
      CCAvg      < 6.635   to the right, improve= 1.2562530, (0 missing)
      ZIP.Code   < 95057   to the left,  improve= 0.9027978, (0 missing)
  Surrogate splits:
      CD.Account splits as  LR,          agree=0.880, adj=0.065, (0 split)
      Mortgage   < 566     to the left,  agree=0.876, adj=0.032, (0 split)

Node number 7: 143 observations,    complexity param=0.01944444
  predicted class=1  expected loss=0.09090909  P(node) =0.08166762
    class counts:    13   130
   probabilities: 0.091 0.909 
  left son=14 (25 obs) right son=15 (118 obs)
  Primary splits:
      Income     < 116.5   to the left,  improve=11.1563600, (0 missing)
      CCAvg      < 2.4     to the left,  improve= 2.6908830, (0 missing)
      Experience < 2.5     to the left,  improve= 2.4008740, (0 missing)
      Age        < 29.5    to the left,  improve= 2.2161600, (0 missing)
      CD.Account splits as  LR,          improve= 0.9500891, (0 missing)

Node number 10: 73 observations
  predicted class=0  expected loss=0.1506849  P(node) =0.04169046
    class counts:    62    11
   probabilities: 0.849 0.151 

Node number 11: 16 observations
  predicted class=1  expected loss=0.375  P(node) =0.009137636
    class counts:     6    10
   probabilities: 0.375 0.625 

Node number 12: 211 observations
  predicted class=0  expected loss=0  P(node) =0.1205026
    class counts:   211     0
   probabilities: 1.000 0.000 

Node number 13: 31 observations
  predicted class=1  expected loss=0.06451613  P(node) =0.01770417
    class counts:     2    29
   probabilities: 0.065 0.935 

Node number 14: 25 observations,    complexity param=0.01944444
  predicted class=0  expected loss=0.48  P(node) =0.01427756
    class counts:    13    12
   probabilities: 0.520 0.480 
  left son=28 (13 obs) right son=29 (12 obs)
  Primary splits:
      CCAvg      < 2.4     to the left,  improve=3.3646150, (0 missing)
      ZIP.Code   < 93060   to the right, improve=2.4938890, (0 missing)
      Age        < 29.5    to the left,  improve=1.2447060, (0 missing)
      Experience < 7       to the left,  improve=1.0800000, (0 missing)
      Education  splits as  -RL,         improve=0.9605195, (0 missing)
  Surrogate splits:
      ZIP.Code   < 94014.5 to the right, agree=0.88, adj=0.750, (0 split)
      Age        < 38.5    to the left,  agree=0.72, adj=0.417, (0 split)
      Experience < 18.5    to the left,  agree=0.72, adj=0.417, (0 split)
      Education  splits as  -RL,         agree=0.72, adj=0.417, (0 split)
      Income     < 113.5   to the left,  agree=0.68, adj=0.333, (0 split)

Node number 15: 118 observations
  predicted class=1  expected loss=0  P(node) =0.06739006
    class counts:     0   118
   probabilities: 0.000 1.000 

Node number 28: 13 observations
  predicted class=0  expected loss=0.2307692  P(node) =0.007424329
    class counts:    10     3
   probabilities: 0.769 0.231 

Node number 29: 12 observations
  predicted class=1  expected loss=0.25  P(node) =0.006853227
    class counts:     3     9
   probabilities: 0.250 0.750 

먼저 첫번째로 나오는 Table에 대한 용어 설명이다.

Variable importance는 변수중요도로써, Income \(>\) Education \(>\) Family 임을 알 수 있다. Node number 1 은 첫번째 노드에 대한 설명으로써 총 1751개의 관측값이 있으며, predicted class=0, cp=0.325, expected loss (불순도)= 0.1027984이다. 전체 관측값 1751개 중 클래스 “0”은 1571, 클래스 “1”은 180개 이며, 비율은 각각 0.897, 0.103이다. left son = 2 (1366 obs)는 왼쪽 자식 노드의 번호는 2 이고 1366개의 관측값이 있다는 뜻이다.

3-2. Tree 그림


3-2-1. “Plot”

plot(UB.trd.rtree, branch=1, margin=0.2)
text(UB.trd.rtree, all=T, use.n=T)

3-2-2. “fancyRpartPlot”

fancyRpartPlot(UB.trd.rtree) # 가독성 좋은 그래프래프

3-2-3. “visTree”

pacman::p_load("visNetwork","sparkline")  # 네트워크 기반 그래프
visTree(UB.trd.rtree)

3-3. 가지치기(과적합 문제 해결)

과적합 문제를 해결하기 위해 가지치기를 수행한다. rpart에서 최적의 cp 값을 찾는 것이 중요하며, 이것은 xerror가 최소가 되는 cp를 찾으면 된다.

table              <- UB.trd.rtree$cptable             # cp Table

low.error          <- which.min(table[ ,"xerror"])     # table의 ”xerror”열에서 가장 낮은 값 위치 추출치 추출
cp.best            <- table[low.error, "CP"]           # ”CP”에서 low.error에 해당하는 cp 선택 선택

UB.trd.prune.rtree <- prune(UB.trd.rtree, cp=cp.best)  # prune(트리모형, 최적의 “CP”)
UB.trd.prune.rtree$cptable                             # 최종 모형에 대한 cp tableNA
          CP nsplit rel error    xerror       xstd
1 0.32500000      0      1.00 1.0000000 0.07060066
2 0.15000000      2      0.35 0.4111111 0.04676994
3 0.01944444      3      0.20 0.1944444 0.03253697


가지치기를 함으로써 최종 모형이 완성되고 최종 모형에 대한 Tree 그림은 다음과 같다.

plot(UB.trd.prune.rtree, branch=0.8, margin=0.2)
text(UB.trd.prune.rtree, all=T, use.n=T)

fancyRpartPlot(UB.trd.prune.rtree)        # 가독성 좋은 그래프                             

visTree(UB.trd.prune.rtree)   # 네트워크 기반 그래프

3-4. 모형 평가

# 적합된 모형에 대하여 Test Data 예측NAtest.rtree <- predict(UB.trd.prune.rtree, newdata=UB.ted, type="class")  # predict(트리모형, Test Data)       

3-4-1. ConfusionMatrix

pacman::p_load("caret")

CM   <- confusionMatrix(test.rtree, UB.ted$Personal.Loan, positive="1")  # confusionMatrix(예측 클래스, 실제 클래스, positive="관심클래스")심클래스")

CM
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 656  12
         1  17  64
                                          
               Accuracy : 0.9613          
                 95% CI : (0.9449, 0.9739)
    No Information Rate : 0.8985          
    P-Value [Acc > NIR] : 1.228e-10       
                                          
                  Kappa : 0.7937          
                                          
 Mcnemar's Test P-Value : 0.4576          
                                          
            Sensitivity : 0.84211         
            Specificity : 0.97474         
         Pos Pred Value : 0.79012         
         Neg Pred Value : 0.98204         
             Prevalence : 0.10147         
         Detection Rate : 0.08545         
   Detection Prevalence : 0.10814         
      Balanced Accuracy : 0.90842         
                                          
       'Positive' Class : 1               
                                          
detach(package:caret)

3-4-2. ROC 곡선


1) Package “pROC”

pacman::p_load("pROC")

test.rtree.prob <- predict(UB.trd.prune.rtree, newdata=UB.ted) #  Training Data로 적합시킨 모형에 대한 Test Data의 각 클래스에 대한 예측 확률
test.rtree.prob <- test.rtree.prob[,2]                         # "1"에 대한 예측 확률


ac         <- as.numeric(as.character(UB.ted$Personal.Loan))   # 범주형을 숫자형으로 변환할 때 문자형으로 변환한 뒤 숫자형으로 변환해야함NArpp        <- as.numeric(test.rtree.prob)                      # "1"에 대한 예측 확률

rtree.roc  <- roc(ac, rpp, plot=T, col="red")                  # roc(실제 클래스, 예측 확률)률)

auc        <- round(auc(rtree.roc),3)
legend("bottomright", legend=auc, bty="n")
detach(package:pROC)


2) Package “Epi”

pacman::p_load("Epi")       
# install_version("etm", version = "1.1", repos = "http://cran.us.r-project.org")

ROC(rpp, ac, plot="ROC")   # ROC(예측 확률, 실제 클래스) / 최적의 cutoff value 예측 가능
detach(package:Epi)


3) Package “ROCR”

pacman::p_load("ROCR")

rtree.pred <- prediction(test.rtree.prob, UB.ted$Personal.Loan)    # prediction(예측 확률, 실제 클레스)    

rtree.perf <- performance(rtree.pred, "tpr", "fpr")                # performance(, "민감도", "1-특이도")                        
plot(rtree.perf, col="blue")                                       # ROC Curve

perf.auc   <- performance(rtree.pred, "auc")                       # AUC
auc        <- attributes(perf.auc)$y.values
legend("bottomright", legend=auc, bty="n")

3-4-3. 향상 차트


1) Package “ROCR”

rtree.perf <- performance(rtree.pred, "lift","rpp")                # Lift Chart                      
plot(rtree.perf, main="lift curve", colorize=T, lwd=2) 
detach(package:ROCR)


2) Package “lift”

pacman::p_load("lift")

plotLift(test.rtree.prob, UB.ted$Personal.Loan, cumulative = T, n.buckets = 24) # plotLift(예측 확률, 실제 클래스)스)
TopDecileLift(test.rtree.prob, UB.ted$Personal.Loan)                            # Top 10% 향상도 출력
[1] 7.753
detach(package:lift)

4. R Package “C50”

의사결정나무모형을 형성하기 위해 사용될 두번째 Package는 "C50"이다. "C50"은 앙상블 기법 중 부스팅을 이용하는데, 부스팅이란 붓스트랩 샘플을 독립적으로 복원 추출한 다수의 붓스트랩 트리를 하나씩 순차적으로 업그레이드하여 다음 트리에 가중치를 주는 방식으로 학습하는 방법이다.

C5.0(x, y, trials = 1, ...)

4-1. 모형 적합

pacman::p_load(C50)   

set.seed(200)             # C5.0함수는 seed값이 필요함NAtree <- C5.0(UB.trd[-9], UB.trd$Personal.Loan, trials=1)  
tree      

Call:
C5.0.default(x = UB.trd[-9], y = UB.trd$Personal.Loan, trials = 1)

Classification Tree
Number of samples: 1751 
Number of predictors: 12 

Tree size: 8 

Non-standard options: attempt to group attributes

4-2. Tree 그림

plot(tree)

4-3. 최적의 부스팅 횟수

"C5.0"에서 가장 중요한 것은 최적의 부스팅 횟수를 찾는 것이다. 아래의 코드는 부스팅 횟수를 1~100회로 늘리면서 각 부스팅 횟수에 대해 정확도를 구한 후 정확도가 가장 높은 부스팅 횟수를 찾는 방법이다.

pacman::p_load("progress",                                              # For progress_bar
               "caret")                                                 # For confusionMatrix

pb      <- progress_bar$new(total = 100)                                # for문의 진행상황 확인NAresults <- c()

ac      <- UB.ted$Personal.Loan                                         # 실제 클래스래스
 
for(i in 1:100){                                                        # 부스팅 횟수 1~100회
  
  pb$tick()
  
  set.seed(200)
  tree  <- C5.0(UB.trd[-9], UB.trd$Personal.Loan, trials=i)             # 각 부스팅 횟수에 대한 모형
  pp.cm <- predict(tree, UB.ted, type="class")                          # 예측 클래스래스
  
  
  CM <- confusionMatrix(as.factor(pp.cm), as.factor(ac), positive="1")  # confusionMatrix
  
  results[i] <- as.numeric(CM$overall[1])                               # confusionmatrix의 정확도 추출NA}

which.max(results);results[which.max(results)]                          # 정확도가 가장 높은 값과 위치 출력출력
[1] 10
[1] 0.9839786


정확도가 가장 높은 값으로 다시 C5.0 함수를 이용하여 최종 모형을 구한다.

set.seed(200)
tree.10   <- C5.0(UB.trd[-9], UB.trd$Personal.Loan, trials=10)   #  최종 모형NAplot(tree.10)

4-4. 모형 평가

# 적합된 모형에 대하여 Test Data 예측NAtest.10 <- predict(tree.10, newdata=UB.ted, type="class")         # predict(트리모형, Test Data)         

4-4-1. ConfusionMatrix

pacman::p_load("caret")

CM      <- confusionMatrix(test.10, UB.ted$Personal.Loan, positive="1")  # confusionMatrix(예측 클래스, 실제 클래스, positive="관심클래스")심클래스")

CM
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 669   8
         1   4  68
                                          
               Accuracy : 0.984           
                 95% CI : (0.9722, 0.9917)
    No Information Rate : 0.8985          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.91            
                                          
 Mcnemar's Test P-Value : 0.3865          
                                          
            Sensitivity : 0.89474         
            Specificity : 0.99406         
         Pos Pred Value : 0.94444         
         Neg Pred Value : 0.98818         
             Prevalence : 0.10147         
         Detection Rate : 0.09079         
   Detection Prevalence : 0.09613         
      Balanced Accuracy : 0.94440         
                                          
       'Positive' Class : 1               
                                          
detach(package:caret)

4-4-2. ROC 곡선


1) Package “pROC”

pacman::p_load("pROC")

test.10.prob <- predict(tree.10, newdata=UB.ted, type="prob")  #  Training Data로 적합시킨 모형에 대한 Test Data의 각 클래스에 대한 예측 확률
test.10.prob <- test.10.prob[,2]                               # "1"에 대한 예측 확률


ac         <- as.numeric(as.character(UB.ted$Personal.Loan))   # 범주형을 숫자형으로 변환할 때 문자형으로 변환한 뒤 숫자형으로 변환해야함NAcpp        <- as.numeric(test.10.prob)                         # "1"에 대한 예측 확률

ctree.roc  <- roc(ac, cpp, plot=T, col="red")                  # roc(실제 클래스, 예측 확률)률)

auc        <- round(auc(ctree.roc),3)
legend("bottomright", legend=auc, bty="n")
detach(package:pROC)


2) Package “Epi”

pacman::p_load("Epi")       
# install_version("etm", version = "1.1", repos = "http://cran.us.r-project.org")

ROC(cpp, ac, plot="ROC")   # ROC(예측 확률, 실제 클래스) / 최적의 cutoff value 예측 가능
detach(package:Epi)


3) Package “ROCR”

pacman::p_load("ROCR")

ctree.pred <- prediction(test.10.prob, UB.ted$Personal.Loan)       # prediction(예측 확률, 실제 클레스)    

ctree.perf <- performance(ctree.pred, "tpr", "fpr")                # performance(, "민감도", "1-특이도")                        
plot(ctree.perf, col="blue")                                       # ROC Curve

perf.auc   <- performance(ctree.pred, "auc")                       # AUC
auc        <- attributes(perf.auc)$y.values
legend("bottomright", legend=auc, bty="n")

4-4-3. 향상 차트


1) Package “ROCR”

ctree.perf <- performance(ctree.pred, "lift","rpp")                # Lift Chart                      
plot(ctree.perf, main="lift curve", colorize=T, lwd=2) 


2) Package “lift”

pacman::p_load("lift")

plotLift(test.10.prob, UB.ted$Personal.Loan, cumulative = T, n.buckets = 24) # plotLift(예측 확률, 실제 클래스)스)
TopDecileLift(test.10.prob, UB.ted$Personal.Loan)      # Top 10% 향상도 출력
[1] 8.935
detach(package:lift)

5. 모형 비교

pacman::p_load("ROCR")

rtree.pred <- prediction(test.rtree.prob, UB.ted$Personal.Loan)    # prediction(예측 확률, 실제 클레스)    

rtree.perf <- performance(rtree.pred, "tpr", "fpr")                # performance(, "민감도", "1-특이도")                        
plot(rtree.perf, col="blue")                                       # ROC Curve
par(new=TRUE)
ctree.perf <- performance(ctree.pred, "tpr", "fpr")                # performance(, "민감도", "1-특이도")                        
plot(ctree.perf, col="red")                                        # ROC Curve
legend("bottomright", legend=c("rpart","C50"), col=c("blue", "red"), lty=c(1,1))
detach(package:ROCR)

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".