Here is an interesting example of how similar mechanics lead to two very different statistical tools. Principal Components Analysis (PCA) is a powerful method for data compression, in the sense of capturing the information contained in a large set of variables by a smaller set of linear combinations of those variables. As such, it is widely used in applications that require data compression, such as visualization of high-dimensional data and prediction.
Factor Analysis (FA), technically considered a close cousin of PCA, is popular in the social sciences, and is used for the purpose of discovering a small number of ‘underlying factors’ from a larger set of observable variables. Although PCA and FA are both based on orthogonal linear combinations of the original variables, they are very different conceptually: FA tries to relate the measured variables to underlying theoretical concepts, while PCA operates only at the measurement level. The former is useful for explaining; the latter for data reduction (and therefore prediction).
Richard Darlington, a Professor Emeritus of Psychology at Cornell, has a nice webpage describing the two. He tries to address the confusion between PCA and FA by first introducing FA and only then PCA, which is the opposite of what you’ll find in textbooks. Darlington comments:
I have introduced principal component analysis (PCA) so late in this chapter primarily for pedagogical reasons. It solves a problem similar to the problem of common factor analysis, but different enough to lead to confusion. It is no accident that common factor analysis was invented by a scientist (differential psychologist Charles Spearman) while PCA was invented by a statistician. PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution with some very nice mathematical properties. One can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good.
Machine learners are very familiar with PCA as well as other compression-type algorithms such as Singular Value Decomposition (the most heavily used compression technique in the Netflix Prize competition). Such compression methods are also used as alternatives to variable selection algorithms, such as forward selection and backward elimination. Rather than retain or remove “complete” variables, combinations of them are used.
I recently learned of Independent Components Analysis (ICA) from Scott Nestler, a former PhD student in our department. He used ICA in his dissertation on portfolio optimization. The idea is similar to PCA, except that the resulting components are not only uncorrelated, but actually independent.