The parallel coordinate plot is useful for visualizing multivariate data in a dis-aggregated way, where we have multiple numerical measurements for each record. A scatter plot displays two measurements for each record by using the two axes. A parallel coordinate plot can display many measurements for each record, by using many (parallel) axes – one for each measurement. While not as popular as other charts, it sometimes turns out to be useful, so it’s good to have it in the visualization toolkit. Software such as TIBCO Spotfire and XLMiner include the parallel coordinate plot. There’s even a free Excel add-on. But … Continue reading Parallel coordinate plot in Tableau: a workaround
The recent issue of the Journal of Computational Graphics & Statistics published a short article by Columbia Univ Prof Andrew Gelman (I believe he is the most active statistician-blogger) called “Why tables are really much better than graphs” based on his April 1, 2009 blog post (note the difference in publishing speed using blogs and refereed journals!). The last parts made me laugh hysterically – so let me share them: About creating and reporting “good” tables: It’s also helpful in a table to have a minimum of four significant digits. A good choice is often to use the default provided … Continue reading Nice April Fool’s Day prank
In business schools it is common to teach statistics courses using Microsoft Excel, due to its wide accessibility and the familiarity of business students with the software. There is a large debate regarding this practice, but at this point the reality is clear: the figure that I am familiar with is about 50% of basic stat courses in b-schools use Excel and 50% use statistical software such as Minitab or JMP. Another trend is moving from offline software to “cloud computing” — Software such as www.statcrunch.com offer basic stat functions in an online, collaborative, social-networky style. Following the popularity of … Continue reading Google Spreadsheets for teaching probability?
In my recent book Practical Time Series Forecasting: A Practical Guide, I included an example of using Microsoft Excel’s moving average plot to suppress monthly seasonality. This is done by creating a line plot of the series over time and then Add Trendline > Moving Average (see my post about suppressing seasonality). The purpose of adding the moving average trendline to a time plot is to better see a trend in the data, by suppressing seasonality. A moving average with window width w means averaging across each set of w consecutive values. For visualizing a time series, we typically use a centered moving average … Continue reading Moving Average chart in Excel: what is plotted?
Being in Bhutan this year, I have requested the American Statistical Association (ASA) and INFORMS to mail the magazines that come with my membership to Bhutan. Although I can access the magazines online, I greatly enjoy receiving the issues by mail (even if a month late) and leafing through them leisurely. Not to mention the ability to share them with local colleagues who are seeing these magazines for the first time! Now to the data-analytic reason for my post: The main article in the August 2010 issue of AMSTAT News (the ASA’s magazine) on Fellow Award: Revisited (Again) presented an “update to … Continue reading ASA’s magazine: Excel’s default charts
Scatterplots are extremely popular and useful graphical displays for examining the relationship between two numeric variables. They get even better when we add the use of color/hue and shape to include information on a third, categorical variable (or we can use size to include information on an additional numerical variable, to produce a “bubble chart”). For example, say we want to examine the relationship between the happiness of a nation and the percent of the population that live in poverty conditions — using 2004 survey data from the World Database of Happiness. We can create a scatterplot with “Happiness” on … Continue reading Creating color-coded scatterplots in Excel: a nightmare
Histograms are very useful charts for displaying the distribution of a numerical measurement. The idea is to bucket the numerical measurement into intervals, and then to display the frequency (or percentage) of records in each interval. Two ways to generate a histogram in Excel are: Create a pivot table, with the measurement of interest in the Column area, and Count of that measurement (or any measurement) in the Data area. Then, right-click the column area and “Group and Show Detail > Group” will create the intervals. Now simply click the chart wizard to create the matching chart. You will still … Continue reading Histograms in Excel
One of the misleading features of Microsoft Office software is that it gives the user the illusion that they are in control of what’s visible and what’s hidden to readers of the files. One example is copy-pasting from an Excel sheet into a Word or Power Point. If you now double click on the embedded piece you’ll see… the Excel file! It is automatically embedded within the Word/Power Point file. A few years ago, after teaching this to MBAs, a student came the following week all excited, telling me how he just detected fraudulent reporting to his company by a … Continue reading Microsoft and the financial downfall