Principal Components Analysis vs. Factor Analysis

Here is an interesting example of how similar mechanics lead to two very different statistical tools. Principal Components Analysis (PCA) is a powerful method for data compression, in the sense of capturing the information contained in a large set of variables by a smaller set of linear combinations of those variables. As such, it is widely used in applications that require data compression, such as visualization of high-dimensional data and prediction. Factor Analysis (FA), technically considered a close cousin of PCA, is popular in the social sciences, and is used for the purpose of discovering a small number of ‘underlying … Continue reading Principal Components Analysis vs. Factor Analysis

Are experiments always better?

This continues my “To Explain or To Predict?” argument (in brief: statistical models aimed at causal explanation will not necessarily be good predictors). And now, I move to a very early stage in the study design: how should we collect data? A well-known notion is that experiments are preferable to observational studies. The main difference between experimental studies and observational studies is an issue of control. In experiments, the researcher can deliberately choose “treatments” and control the assignment of subjects to the “treatments”, and then can measure the outcome. Whereas in observational studies, the researcher can only observe the subjects … Continue reading Are experiments always better?

What R-squared is (and is not)

R-squared (aka “coefficient of determination”, or for short, R2) is a popular measure used in linear regression to assess the strength of the linear relationship between the inputs and the output. In a model with a single input, R2 is simply the squared correlation coefficient between the input and output. If you examine a few textbooks in statistics or econometrics, you will find several definitions of R2. The most common definition is “the percent of variation in the output (Y) explained by the inputs (X’s)”. Another definition is “a measure of predictive power” (check out Wikepedia!). And finally, R2 is … Continue reading What R-squared is (and is not)

Start the Revolution

Variability is a key concept in statistics. The Greek letter Sigma has such importance, that it is probably associated more closely with statistics than with Greek. Yet, if you have a chance to examine the bookshelf of introductory statistics textbooks in a bookstore or the library you will notice that the variability between the zillions of textbooks, whether in engineering, business, or the social sciences, is nearly zero. And I am not only referring to price. I can close my eyes and place a bet on the topics that will show up in the table of contents of any textbook … Continue reading Start the Revolution