Statistics are not always the blame!

My current MBA student Brenda Martineau showed me a March 15, 2007 article in the Wall Street Journal entitled Stupid Cancer Statistics. Makes you almost think that once again someone is abusing statistics — but wait! A closer look reveals that the real culprit is not the “mathematical models”, but rather the variable that is being measured and analyzed! According to the article, the main fault is in measuring (and modeling) mortality rate in order to determine the usefulness of breast cancer early screening. Women who get diagnosed early (before the cancer escapes the lung) do not necessarily live longer … Continue reading Statistics are not always the blame!

Classification Trees: CART vs. CHAID

When it comes to classification trees, there are three major algorithms used in practice. CART (“Classification and Regression Trees”), C4.5, and CHAID. All three algorithms create classification rules by constructing a tree-like structure of the data. However, they are different in a few important ways. The main difference is in the tree construction process. In order to avoid over-fitting the data, all methods try to limit the size of the resulting tree. CHAID (and variants of CHAID) achieve this by using a statistical stopping rule that discontinuous tree growth. In contrast, both CART and C4.5 first grow the full tree … Continue reading Classification Trees: CART vs. CHAID

Another Treemap in NYT!

While we’re at it, this Saturday’s Business section of the New York Times featured the article Sifting data to Uncover Travel Deals. One of the websites mentioned (PointMaven.com) actually uses a Treemap to display hotel points promotions. OK — full disclosure: this is my husband’s website and yes, I was involved… But hey — that’s the whole point of having an in-house statistician! Continue reading Another Treemap in NYT!

Visualizing hierarchical data

Today much data is gathered from the web. Data from websites often tend to be hierarchical in nature: For example, on Amazon we have categories (music, books, etc.), then within a category there are sub-categories (e.g, within Books: Business & Technology, Childrens’ books, etc.), and sometimes there are ever additional layers. Other examples are eBay, epinions, and almost any e-tailor. Even travel sites usually include some level of hierarchy. The standard plots and graphs such as bar charts, histograms, boxplots might be useful for visualizing a particular level of hierarchy, but not the “big picture”. The method of trellising is … Continue reading Visualizing hierarchical data