# Classification Trees: CART vs. CHAID

When it comes to classification trees, there are three major algorithms used in practice. CART (“Classification and Regression Trees”), C4.5, and CHAID.

All three algorithms create classification rules by constructing a tree-like structure of the data. However, they are different in a few important ways.

The main difference is in the tree construction process. In order to avoid over-fitting the data, all methods try to limit the size of the resulting tree. CHAID (and variants of CHAID) achieve this by using a statistical stopping rule that discontinuous tree growth. In contrast, both CART and C4.5 first grow the full tree and then prune it back. The tree pruning is done by examining the performance of the tree on a holdout dataset, and comparing it to the performance on the training set. The tree is pruned until the performance is similar on both datasets (thereby indicating that there is no over-fitting of the training set). This highlights another difference between the methods: CHAID and C4.5 use a single dataset to arrive at the final tree, whereas CART uses a training set to build the tree and a holdout set to prune it.

A difference between CART and the other two is that the CART splitting rule allows only binary splits (e.g., “if Income

## 3 thoughts on “Classification Trees: CART vs. CHAID”

1. A comment on the difference between CART and CHAID splitting criteria: At least for the case of binary splitting, if you do little algebra you may discover that the increase in homogeneity after splitting data into two child nodes as measured by GINI index is basically proportional to the chi-square statistic for independence on that split. In the light of this, I think the methodological distinction between the former as more useful for the task of “prediction” and the latter for “explanation” appear somewhat superficial

2. I need some help. CART is more useful in prediction while CHAID is more useful in analysis. Is there any possibility that the results of the two tool will meet half way around?
is it possible for the two to have results for both analysis and prediction? please elaborate.

3. Fellah – there are two issues here. The general issue is whether there is a single solution that provides best predictive power and strongest explanatory power. I believe the answer is no. You need to prioritize whether the main goal is predictive, but you want some explanatory power as second-level goal, or the other way around. This is true for any predictive model vs. explanatory model.

With respect to CART and CHAID, I suppose that in some cases you will get similar trees (if the signal is very strong). You can use the commonalities to infer about robustness (single trees are known to suffer from robustness). I am not aware of research on the differences between CART and CHAID along these lines (only along lines of predictive power).