Predictive analytics in the long term

Ten years ago, micro-level prediction the way we know it today, was nearly absent in companies. MBAs learned about data analysis mostly in a requires statistics course, which covered mostly statistical inference and descriptive modeling. At the time, I myself was learning my way into the predictive world, and designed the first Data Mining course at University of Maryland’s Smith School of Business (which is running successfully until today!). When I realized the gap, I started giving talks about the benefits of predictive analytics and its uses. And I’ve designed and taught a bunch of predictive analytics courses/programs around the … Continue reading Predictive analytics in the long term

Categorical predictors: how many dummies to use in regression vs. k-nearest neighbors

Recently I’ve had discussions with several instructors of data mining courses about a fact that is often left out of many books, but is quite important: different treatment of dummy variables in different data mining methods. From http://blog.excelmasterseries.com Statistics courses that cover linear or logistic regression teach us to be careful when including a categorical predictor variable in our model. Suppose that we have a categorical variable with m categories (e.g., m countries). First, we must factor it into m binary variables called dummy variables, D1, D2,…, Dm (e.g., D1=1 if Country=Japan and 0 otherwise; D2=1 if Country=USA and 0 otherwise, etc.) … Continue reading Categorical predictors: how many dummies to use in regression vs. k-nearest neighbors

Psychology journal bans statistical inference; knocks down server

In its recent editorial, the journal Basic and Applied Social Psychology announced that it will no longer accept papers that use classical statistical inference. No more p-values, t-tests, or even… confidence intervals!  “prior to publication, authors will have to remove all vestiges of the NHSTP (p-values, t-values, F-values, statements about ‘‘significant’’ differences or lack thereof, and so on)… confidence intervals also are banned from BASP” Many statisticians would agree that it is high time to move on from p-values and statistical inference to practical significance, estimation, more elaborate non-parametric modeling, and resampling for avoiding assumption-heavy models. This is especially so now, … Continue reading Psychology journal bans statistical inference; knocks down server

Teaching spaces: “Analytics in a Studio”

My first semester at NTHU has been a great learning experience. I introduced and taught two new courses in our new Business Analytics concentration (data mining and forecasting). Both courses met once a week for a 3-hour session for a full semester (18 weeks). Although I’ve taught these courses in different forms, in different countries, and to different audiences, I had a special discovery this time. I discovered the critical role of the learning space on the quality of teaching and learning. Specifically for a topic that combines technical, creativity and communication skills. “Case study” classroom In my many years of experience … Continue reading Teaching spaces: “Analytics in a Studio”

New curriculum design guidelines by American Statistical Association: Who will teach?

The American Statistical Association published new “Curriculum Guidelines for Undergraduate Programs in Statistical Science“. This is the first update to the guidelines since 2000. The executive summary lists the key points: Increased importance of data science Real applications More diverse models and approaches Ability to communicate This set sounds right on target with what is expected of statisticians in industry (the authors of the report include prominent statisticians in industry). It highlights the current narrow focus of statistics programs as well as their lack of real-world usability.  I found three notable mentions in the descriptions of the above points: Point … Continue reading New curriculum design guidelines by American Statistical Association: Who will teach?

What’s in a name? “Data” in Mandarin Chinese

The term “data”, now popularly used in many languages, is not as innocent as it seems. The biggest controversy that I’ve been aware of is whether the English term “data” is singular or plural. The tone of an entire article would be different based on the author’s decision. In Hebrew, the word is in plural (Netunim, with the final “im” signifying plural), so no question arises. Today I discovered another “data” duality, this time in Mandarin Chinese. In Taiwan, the term used is 資料 (Zīliào), while in Mainland China it is 數據 (Shùjù). Which one to use? What is the … Continue reading What’s in a name? “Data” in Mandarin Chinese

Humane and Socially Responsible Analytics: A new concentration at National Tsing Hua University

This Fall, I’m introducing two new elective courses at NTHU’s Institute of Service Science: Business Analytics using Data Mining and Business Analytics using Forecasting (if you’re wondering about the difference, see an earlier post). The two new courses join three other elective courses to form the new concentration in Business Analytics. Courses in this concentration are aimed at getting students into the world of analytics by doing. The courses are designed as hands-on, project-oriented courses, with global contests, that allow students to experience different tools. Most importantly, our program is focused on humane and socially responsible analytics. We discuss and consider … Continue reading Humane and Socially Responsible Analytics: A new concentration at National Tsing Hua University

India redefines “reciprocity”; Israeli professionals pay the price

After a few years of employment at the Indian School of Business (in 2010 as a visitor and later as a tenured SRITNE Chaired Professor of Data Analytics), the time has come for me to get a new Employment Visa. As an Israeli-American, I decided to apply for the visa using my Israeli passport. I was almost on my way to the Indian embassy when I discovered, to my horror, that the fee is over USD $1000 for a one-year visa on an Israeli passport. The more interesting part is that Israelis are charged the highest fee compared to any … Continue reading India redefines “reciprocity”; Israeli professionals pay the price

Parallel coordinate plot in Tableau: a workaround

The parallel coordinate plot is useful for visualizing multivariate data in a dis-aggregated way, where we have multiple numerical measurements for each record. A scatter plot displays two measurements for each record by using the two axes. A parallel coordinate plot can display many measurements for each record, by using many (parallel) axes – one for each measurement. While not as popular as other charts, it sometimes turns out to be useful, so it’s good to have it in the visualization toolkit. Software such as TIBCO Spotfire and XLMiner include the parallel coordinate plot. There’s even a free Excel add-on. But … Continue reading Parallel coordinate plot in Tableau: a workaround

Can women be professors or doctors? Not according to Jet Airways

I am already used to the comical scene at airports in Asia, where a sign-holder with “Professor Galit Shmueli” sees us walk in his/her direction and right away rushes to my husband. Whether or not the stereotype is based on actual gender statistics of professors in Asia is a good question. What I don’t find amusing is when a corporate like Jet Airways, under the guise of “celebrating international women’s day“, follows the same stereotype. When I tried to book a flight on Jetairways.com, it would not allow me to use the Women’s Day discount code if I chose title … Continue reading Can women be professors or doctors? Not according to Jet Airways