Last week Lori Rothenberg from SAS Higher Education visited our MBA class. She gave a 1-hour tutorial on SAS Enterprise Miner, which is the data mining software package by SAS. This is a pretty powerful tool, especially when dealing with large datasets. One of the nicest features of SAS EM is the "workspace", which displays a diagram of the entire modeling process, from data specification, through data manipulation, modeling, evaluation, and scoring. Aside from software training, Lori described several real data mining projects, and how they were able to add value to the businesses. This further supports the course effort

I've discussed the uselessness of p-values in very large samples, where even miniscule effects become magnified. This is known as the divergence between practical significance and statistical significance. An interesting article in the most recent issue of The American Statistician describes another dangerous pitfall in using p-values. In their article The Difference Between "Significant" and "Not Significant" is not Itself Statistically Significant, Andrew Gelman (a serious blogger himself!) and Hal Stern warn that the comparison of p-values to one another for the purpose of discerning a difference between the corresponding effects (or parameters) is erroneous. Consider, for example, fitting the

OK, I admit it – I did peak over the shoulder of my fellow Metro rider last night (while returning from teaching Classification Trees), to better see her Wall Street Journal's front page. The article that caught my eye was "Democracts, Playing Catch-Up, Tap Database to Woo Potential Voters". I only managed to catch the first few paragraphs before the newspaper owner flipped to the next page. Luckily, my student Michael Melcer just emailed me the complete article. He put it very nicely: Hi Professor,Thought you might find this article interesting. Sounds like politicians are using regression with a binary

The last episode of Numb3rs (the CBS show) that was broadcasted on Friday Oct 27 was called Longshot. Here is the description: In the episode, Don brings Charlie a notebook that was found on the body. Itcontains horse racing data and equations. Charlie determines that the equationswere designed to pick the SECOND place winner, not first place. Parts of theseequations use the "logit" function, a specific probability function that useslogarithms and odds ratios. Because the logit function can get prettycomplicated, this activity lays its foundations, namely the relationship betweenprobability, odds, and odds ratios. This is a nice way to introduce