Now that the emotional storm following the American Statistical Association’s statement on p-values is slowing down (is it? was there even a storm outside of the statistics area?), let’s think about a practical issue. One that greatly influences data analysis in most fields: statistical software. Statistical software influences which methods are used and how they are reported. Software companies thus affect entire disciplines and how they progress and communicate. Star notation for p-value thresholds in statistical software No matter whether your field uses SAS, SPSS (now IBM), STATA, or another statistical software package, you’re likely to have seen the star … Continue reading Statistical software should remove *** notation for statistical significance
Recently I’ve had discussions with several instructors of data mining courses about a fact that is often left out of many books, but is quite important: different treatment of dummy variables in different data mining methods. From http://blog.excelmasterseries.com Statistics courses that cover linear or logistic regression teach us to be careful when including a categorical predictor variable in our model. Suppose that we have a categorical variable with m categories (e.g., m countries). First, we must factor it into m binary variables called dummy variables, D1, D2,…, Dm (e.g., D1=1 if Country=Japan and 0 otherwise; D2=1 if Country=USA and 0 otherwise, etc.) … Continue reading Categorical predictors: how many dummies to use in regression vs. k-nearest neighbors
My first semester at NTHU has been a great learning experience. I introduced and taught two new courses in our new Business Analytics concentration (data mining and forecasting). Both courses met once a week for a 3-hour session for a full semester (18 weeks). Although I’ve taught these courses in different forms, in different countries, and to different audiences, I had a special discovery this time. I discovered the critical role of the learning space on the quality of teaching and learning. Specifically for a topic that combines technical, creativity and communication skills. “Case study” classroom In my many years of experience … Continue reading Teaching spaces: “Analytics in a Studio”
“Big Data” is a big buzzword. I bet that sentiment analysis of news coverage, blog posts and other social media sources would show a strong positive sentiment associated with Big Data. What exactly is big data depends on who you ask. Some people talk about lots of measurements (what I call “fat data”), others of huge numbers of records (“long data”), and some talk of both. How much is big? Again, depends who you ask. As a statistician who’s (luckily) strayed into data mining, I initially had the traditional knee-jerk reaction of “just get a good sample and get it … Continue reading Big Data: The Big Bad Wolf?
Quantitative forecasting is an age-old discipline, highly useful across different functions of an organization: from forecasting sales and workforce demand to economic forecasting and inventory planning. Business schools have offered courses with titles such as “Time Series Forecasting”, “Forecasting Time Series Data“, “Business Forecasting“, more specialized courses such as “Demand Planning and Sales Forecasting” or even graduate programs with title “Business and Economic Forecasting“. Simple “Forecasting” is also popular. Such courses are offered at the undergraduate, graduate and even executive education. All these might convey the importance and usefulness of forecasting, but they are far from conveying the coolness of forecasting. … Continue reading Forecasting + Analytics = ?
Some time ago, when I presented the “explain or predict” work, my colleague Avi Gal asked where simulation falls. Simulation is a key method in operations research, as well as in statistics. A related question arose in my mind when thinking of Scott Nestler‘s distinction between descriptive/predictive/prescriptive analytics. Scott defines prescriptive analytics as “what should happen in the future? (optimization, simulation)“. So where does simulation fall? Does it fall in a completely different goal category, or can it be part of the explain/predict/describe framework? My opinion is that simulation, like other data analytics techniques, does not define a goal in … Continue reading Explain or predict: simulation
Business Intelligence and Data Mining have become hot buzzwords in the West. Using Google Insights for Search to “see what the world is searching for” (see image below), we can see that the popularity of these two terms seems to have stabilized (if you expand the search to 2007 or earlier, you will see the earlier peak and also that Data Mining was hotter for a while). Click on the image to get to the actual result, with which you can interact directly. There are two very interesting insights from this search result: Looking at the “Regional Interest” for these terms, we see that … Continue reading Analytics: You want to be in Asia