# Forecasting large collections of time series

With the recent launch of Amazon Forecast, I can no longer procrastinate writing about forecasting “at scale”! Quantitative forecasting of time series has been used (and taught) for decades, with applications in many areas of business such as demand forecasting, sales forecasting, and financial forecasting. The types of methods taught in forecasting courses tends to be discipline-specific: Statisticians love ARIMA (auto regressive integrated moving average) models, with multivariate versions such as Vector ARIMA, as well as state space models and non-parametric methods such as STL decompositions. Econometricians and finance academics go one step further into ARIMA variations such as ARFIMA (f=fractional), … Continue reading Forecasting large collections of time series

# Election polls: description vs. prediction

My papers To Explain or To Predict and Predictive Analytics in Information Systems Research contrast the process and uses of predictive modeling and causal-explanatory modeling. I briefly mentioned there a third type of modeling: descriptive. However, I haven’t expanded on how descriptive modeling differs from the other two types (causal explanation and prediction). While descriptive and predictive modeling both share the reliance on correlations, whereas explanatory modeling relies on causality, the former two are in fact different. Descriptive modeling aims to give a parsimonious statistical representation of a distribution or relationship, whereas predictive modeling aims at generating values for new/future observations. … Continue reading Election polls: description vs. prediction

# Key challenges in online experiments: where are the statisticians?

Randomized experiments (or randomized controlled trials, RCT) are a powerful tool for testing causal relationships. Their main principle is random assignment, where subjects or items are assigned randomly to one of the experimental conditions. A classic example is a clinical trial with one or more treatment groups and a no-treatment (control) group, where individuals are assigned at random to one of these groups. Story 1: (Internet) experiments in industry  Internet experiments have now become a major activity in giant companies such as Amazon, Google, and Microsoft, in smaller web-based companies, and among academic researchers in management and the social sciences. … Continue reading Key challenges in online experiments: where are the statisticians?

# Experimenting with quantified self: two months hooked up to a fitness band

It’s one thing to collect and analyze behavioral big data (BBD) and another to understand what it means to be the subject of that data. To really understand. Yes, we’re all aware that our social network accounts and IoT devices share our private information with large and small companies and other organizations. And although we complain about our privacy, we are forgiving about sharing it, most likely because we really appreciate the benefits. So, I decided to check out my data sharing in a way that I cannot ignore: I started wearing a fitness band. I bought one of the … Continue reading Experimenting with quantified self: two months hooked up to a fitness band

# A non-traditional definition of Big Data: Big is Relative

I’ve noticed that in almost every talk or discussion that involves the term Big Data, one of the first slides by the presenter or the first questions to be asked by the audience is “what is Big Data?” The typical answer has to do with some digits, many V’s, terms that end with “bytes”, or statements about software or hardware capacity. I beg to differ. “Big” is relative. It is relative to a certain field, and specifically to the practices in the field. We therefore must consider the benchmark of a specific field to determine if today’s data are “Big”. … Continue reading A non-traditional definition of Big Data: Big is Relative

# What’s in a name? “Data” in Mandarin Chinese

The term “data”, now popularly used in many languages, is not as innocent as it seems. The biggest controversy that I’ve been aware of is whether the English term “data” is singular or plural. The tone of an entire article would be different based on the author’s decision. In Hebrew, the word is in plural (Netunim, with the final “im” signifying plural), so no question arises. Today I discovered another “data” duality, this time in Mandarin Chinese. In Taiwan, the term used is 資料 (Zīliào), while in Mainland China it is 數據 (Shùjù). Which one to use? What is the … Continue reading What’s in a name? “Data” in Mandarin Chinese