India redefines “reciprocity”; Israeli professionals pay the price

After a few years of employment at the Indian School of Business (in 2010 as a visitor and later as a tenured SRITNE Chaired Professor of Data Analytics), the time has come for me to get a new Employment Visa. As an Israeli-American, I decided to apply for the visa using my Israeli passport. I was almost on my way to the Indian embassy when I discovered, to my horror, that the fee is over USD $1000 for a one-year visa on an Israeli passport. The more interesting part is that Israelis are charged the highest fee compared to any … Continue reading India redefines “reciprocity”; Israeli professionals pay the price

Parallel coordinate plot in Tableau: a workaround

The parallel coordinate plot is useful for visualizing multivariate data in a dis-aggregated way, where we have multiple numerical measurements for each record. A scatter plot displays two measurements for each record by using the two axes. A parallel coordinate plot can display many measurements for each record, by using many (parallel) axes – one for each measurement. While not as popular as other charts, it sometimes turns out to be useful, so it’s good to have it in the visualization toolkit. Software such as TIBCO Spotfire and XLMiner include the parallel coordinate plot. There’s even a free Excel add-on. But … Continue reading Parallel coordinate plot in Tableau: a workaround

Analytics magazines: Please lead the way for effective data presentation

Professional “analytics” associations such INFORMS, the American Statistical Association, and the Royal Statistical Society, have been launching new magazines intended for broader, non-academic audiences that are involved or interested in data analytics. Several of these magazines are aesthetically beautiful with plenty of interesting articles about applications of data analysis and their impact on daily life, society, and more. Significance magazine and Analytics magazine are two examples. The next step is for these magazines to implement what we preach regarding data presentation: use effective visualizations. In particular, the online versions can include interactive dashboards! If the New York Times and Washington Post can … Continue reading Analytics magazines: Please lead the way for effective data presentation

The world is flat? Only for US students

Learning and teaching has become a global endeavor with lots of online resources and technologies. Contests are an effective way to engage a diverse community from around the world. In the past I have written several posts about contests and competitions in data mining, statistics and more. And now about a new one. Tableau is a US-based company that sells a cool data visualization tool (there’s a free version too). The company has recently seen huge growth with lots of new adopters in industry and academia. Their “Tableau for teaching” (TfT) program is intended to assist instructors and teachers by … Continue reading The world is flat? Only for US students

Data liberation via visualization

“Data democratization” movements try to make data, and especially government-held data, publicly available and accessible. A growing number of technological efforts are devoted to such efforts and especially the accessibility part. One such effort is by data visualization companies. A recent trend is to offer a free version (or at least free for some period) that is based on sharing your visualization and/or data to the Web. The “and/or” here is important, because in some cases you cannot share your data, but you would like to share the visualizations with the world. This is what I call “data liberation via … Continue reading Data liberation via visualization

Analytics: You want to be in Asia

Business Intelligence and Data Mining have become hot buzzwords in the West. Using Google Insights for Search to “see what the world is searching for” (see image below), we can see that the popularity of these two terms seems to have stabilized (if you expand the search to 2007 or earlier, you will see the earlier peak and also that Data Mining was hotter for a while). Click on the image to get to the actual result, with which you can interact directly. There are two very interesting insights from this search result: Looking at the “Regional Interest” for these terms, we see that … Continue reading Analytics: You want to be in Asia

Scatter plots for large samples

While huge datasets have become ubiquitos in fields such as genomics, large datasets are now also becoming to infiltrate research in the social sciences. Data from eCommerce sites, online dating sites, etc. are now collected as part of research in information systems, marketing and related fields. We can now find social science research papers with hundreds of thousands of observations and more. A common type of research question in such studies is about the relationship between two variables. For example, how does the final price of an online auction relate to the seller’s feedback rating? A classic exploratory tool for examining such … Continue reading Scatter plots for large samples

Nice April Fool’s Day prank

The recent issue of the Journal of Computational Graphics & Statistics published a short article by Columbia Univ Prof Andrew Gelman (I believe he is the most active statistician-blogger) called “Why tables are really much better than graphs” based on his April 1, 2009 blog post (note the difference in publishing speed using blogs and refereed journals!). The last parts made me laugh hysterically – so let me share them: About creating and reporting “good” tables: It’s also helpful in a table to have a minimum of four significant digits. A good choice is often to use the default provided … Continue reading Nice April Fool’s Day prank

Moving Average chart in Excel: what is plotted?

In my recent book Practical Time Series Forecasting: A Practical Guide, I included an example of using Microsoft Excel’s moving average plot to suppress monthly seasonality. This is done by creating a line plot of the series over time and then Add Trendline > Moving Average (see my post about suppressing seasonality). The purpose of adding the moving average trendline to a time plot is to better see a trend in the data, by suppressing seasonality. A moving average with window width w means averaging across each set of w consecutive values. For visualizing a time series, we typically use a centered moving average … Continue reading Moving Average chart in Excel: what is plotted?

Visualizing time series: suppressing one pattern to enhance another pattern

Visualizing a time series is an essential step in exploring its behavior. Statisticians think of a time series as a combination of four components: trend, seasonality, level and noise. All real-world series contain a level and noise, but not necessarily a trend and/or seasonality. It is important to determine whether trend and/or seasonality exist in a series in order to choose appropriate models and methods for descriptive or forecasting purposes. Hence, looking at a time plot,  typical questions include: is there a trend? if so, what type of function can approximate it? (linear, exponential, etc.) is the trend fixed throughout the period … Continue reading Visualizing time series: suppressing one pattern to enhance another pattern