In this week’s issue of BusinessWeek (March 6, 2006), an article called The secret to Google’s success describes a study by three economists showing that Google’s mechanism for auctioning ad space (called AdWords), which is supposed to be a second-price auction, actually “differs in a key respect from the one economists had studied”. I tracked down a report on this study (“The high price of internet keyword auctions” by Edelman, Ostrovsky, and Schwarz) to find out more. And I found out something that is directly related to our work on eBay auctions… Starting from the basics, a second-price auction is … Continue reading Google’s AdWords Auctions

Acronyms – in Hebrew???

There are a multitude of performance measures in statistics and data mining. These tend to have acronyms such as MAPE and RMSE. It turns out that even after spelling them out, it is not always obvious to users how they are computed. Inspired by Don Brown’s The Da Vinchi Code, I devised a deciphering method that allows simple computation of these measures. The trick is to read from right-to-left (like Hebrew or Arabic). Here are two examples: RMSE = Root Mean Squared Error1. Error: compute the errors (actual value – predicted value)2. Squared: take a square of each error3. Mean: … Continue reading Acronyms – in Hebrew???

Comparing models with transformations

In the process of searching for a good model, a popular step is to try different transformations of the variables. This can become a bit tricky when we are transforming the response variable, Y. Consider, for instance, two very simple models for predicting home sales. Let’s assume that in both cases we use predictors such as the home’s attributes, geographical location, market conditions, time of year, etc. The only difference is that the first model is linear: (1) SalesPrice = bo + b1 X1 + … whereas the second model is exponential: (2) SalesPrice = exp{c0 + c1 X1 + … Continue reading Comparing models with transformations

Data partitioning

A central initial step in data mining is to partition the data into two or three partitions. The first partition is called the training set, the second is the validation set, and if there is a third, it is usually called the test set. The purpose of data partitioning is to enable evaluating model predictive performance. In contrast to an explanatory goal, where we want to fit the data as closely as possible, good predictive models are those that have high predictive accuracy. Now, if we fit a model to data, then obviously the “tighter” the model, the better it … Continue reading Data partitioning

Translate “odds”

Odds are a technical term that is often used in horse or car racing. It refers to the ratio p/(1-p) where p is the probability of success. So for instance, a 1:3 odds of winning is equivalent to a probability of 0.25 of winning. What I found odd is that the term “odds” in this meaning does not exist in most languages! Usually, the closest you can get is “proabbility” or “chance”. I first realized it when I tried to translate to Hebrew. Then, students who speak other languages (Spanish, Russian, Chinese) said that is the case in other languates … Continue reading Translate “odds”

The “G” word

I use “G Shmueli” in my slides and in my email signature. This is not about that “G”. It usually surprises students when I say that most of the data analysis should be spent on data exploration rather than modeling. Whether it is for the sake of statistical testing, prediction of new records, or finding a model that helps understand the data structure, the most useful tool is GRAPHS and summaries. Data visualization is so important that in a sense, the models that follow will usually only confirm what we see. A few points:1. Good visualization tools are those that … Continue reading The “G” word

What is Bzst?

Statistics in Business. That’s what it’s all about. And BusinessWeek just revealed our real secret – “Statistics is becoming core skills for businesspeople and consumers… Winners will know how to use statistics – and how to spot when others are dissembling” (Why Math Will Rock Your World, 1/23/2006) So I no longer need to arch my shoulders and shrink when asked “what do you teach?” I’ve been teaching statistics for more than a decade now. Until 2002 I taught mainly engineering students. And then it was called “statistics”. Then, I moved to the Robert H Smith School of Business, and … Continue reading What is Bzst?