- Resources
- STP Publications
- E-xcellence in Teaching Blog
- No Need to Stew about Factor Analysis: Two Homecooked Demonstrations

Ken Cramer *(University of Windsor**) *

Rebecca Pschibul *(Western University)*

Statistical concepts may be among the most challenging for students to grasp in the course of their undergraduate education. Among others, these may include several widely employed statistical concepts such as random sampling, random assignment to groups, correlation vs. causation, Type I vs. Type II errors, and the selection of median vs. mean statistics in the presence of extreme scores (i.e., average income). Researchers and educators have made considerable strides in rendering these everyday concepts accessible to students via memorable examples, vivid demonstrations and simulations. For instance, students can see the folly in misconstruing the correlation between city parades staged for national sports champions to prematurely hosting such a grand display one week before the big game. Students similarly can appreciate the greater risk associated with sending the innocent to jail, or worse (a Type-I error) rather than letting the guilty go free (a Type-II error).

More advanced statistical concepts will arguably prove more challenging to students, and demonstrations to make these units more digestible are underway. One such concept, although widespread in its application, has proven particularly thorny to pass along clearly to students – namely *factor analysis*, a complex statistical data-reduction technique. Factor analysis reduces a larger set of entities or measures into a smaller set of families, or factors, whose constituent members are intercorrelated (Tabachnik & Fidell, 2013). For example, one may derive a series of physical measures of a given individual – including height, head size, hand span, foot size, distance from elbow to wrist, etc. – and find that all measures are moderately to highly intercorrelated based on a singular over-arching latent factor which we might call *Body Size*. There is no direct way to measure a latent factor; it can only be derived or estimated based on its constituent measures; but the concept is real and directly impacts the derivative means of its assessment (namely, item scores on a personality or intelligence questionnaire or numbers on a tape measure).

Factor analysis is used widely across a host of fields in biology, education, and in particular the social sciences like psychology. Whether to develop psychometrically sound instruments to measure complex constructs like intelligence or personality, or to determine the degree of overlap between various existing measures of self-esteem, factor analysis can rarely be avoided in a student’s undergraduate statistical training. Until the advent of faster and more high-powered computers, a typical factor analysis would take the researcher approximately one year in hand calculations (which included deriving relevant correlation matrices, standard deviations, and then combined into covariance matrices). Nevertheless, in spite of its easy of execution, students may still fail to appreciate the interplay of the analysis involved, its subjectivity, and the nature of interpretation; thus, we offer the instructor a hands-on, and memorable, pair of useful demonstrations to help instill this material. Efforts to make the lesson engaging has all too often utilized a mathematical approach, and implemented graphs and animations (Connor, 2003; Segrist & Powlow, 2007; Yu, Andrews, Winogard, Jannasch-Pennell, & DeGangi, 2002). Our intention is to move beyond the mathematical world and embrace a rather more substantive and practical world through hands-on activity.

**Demonstration with a Known Factor Structure (Stew Recipe)**

Students can readily understand how a grocery list may be rendered more efficient if sorted into general categories within the geography of a department store – fruits and vegetables, meats and dairy, spices and sauces, etc. This model, on the surface, illustrates the sorting of food stuffs into general categories where the constituent members share a particular feature (the all dairy items need to be refrigerated), and it offers a starting point to the concept of factor analysis.

The following demonstration utilizes this model of shopping for stew ingredients to show the different categorizations, and member constituencies, taken from a grocery list. Students are invited to the front of a classroom to draw one of 30 cards (see Appendix A) from a mixed deck, each representing one of the many ingredients of a stew. Knowing the final configuration (namely the number and relative constituency of how the cards are grouped), students move about the room and self-sort to form their respective categories – Meats, Vegetables, Liquids, and Spices/Sauces (or Flavorings).

Once in their categories, members of each group are asked to designate both their strongest and weakest member; for example, Vegetables might elect ‘potato’ as their strongest, and ‘onion’ as their weakest. This represents a useful vehicle toward understanding factor loadings (or the relative contribution of any constituent entity or measure to a factor). For instance, in the field of intelligence, the subtest of Vocabulary is the strongest single predictor of mental abilities, and Object Assembly is the weakest (Wechsler, 2008).

**Questions for Probing Student Knowledge**

Students may be asked further questions to strengthen their understanding of factor analysis. We include several examples below:

- Following students’ identification of the weakest members of a category (e.g., onions among a stew’s Vegetables), could the contribution of an entity be so low that it fails to meet cut-off criteria and is excluded from further consideration? That is, in terms of factor analysis, the under-representative entity would showcase an especially low loading that one may argue does not contribute to the understanding or definition of the factor (and should not be included). In other words, could it be argued that a stew need not include onions; but potatoes are a must?
- Are there any broad categorical names (e.g., Spices) where a more suitable alternative might be used (e.g., Flavoring) – that is, perhaps unique nomenclature surrounds the designation of a factor (personality has wrestled with how to name components of the Big-Five – is it Openness to Experience or Culture or Intelligence; see McCrae & Costa, 1996), and does this invite subjectivity to this field of study?
- Could any of the broad categories (Flavourings) be further divided into correlated, but still distinct, sub-categories (Spices and Sauces), and would it improve the understanding of the configuration or structure of a stew to split them? Consider how loneliness, originally thought to have two factors (Social Loneliness and Emotional Loneliness) found the latter sub-divided into Family and Romantic Loneliness (Cramer, Ofosu, & Barry, 2000
- What is to be done with entities not finding a genuine home among any one category but perhaps shared among two? What categories might best situate the entity of tomato sauce – it is arguably a Liquid, but it adds Flavouring, so might it better be included among Spices?

**Demonstration with an Unknown Factor Structure (Parts of the Body)**

Students are then invited to participate in a similar activity with a second deck of cards (see Appendix B; *Parts of the Body*) with a factor structure unknown to them. Students may be similarly probed using this deck of cards: (a) which entity in any family or factor might be its best representative, and what might be the worst? (2) where should one attribute the entity of ‘skull’ – does this belong to Bones or to Face; perhaps both, but which might be stronger?

Similar follow-up questions may be probed to further student understanding of a factor structure with no a-priori hypothesized structure (based on number or constituency).

- Are there especially strong or especially weak members of any given category? Students working in the Bones category may struggle to find a high contributing entity, but skin may almost be dismissed from the category of Organs should students debate its belongingness. Is skin even an organ? It is, say biologists – the largest, in fact.
- Students may encounter disagreement concerning the naming of a category: Bones vs. Skeleton, Head vs. Cranium, Bodily Liquids vs. Bodily Fluids – preferring scientific nomenclature over more common everyday language.
- Students might discover cross-listed entities such as the brain belonging both to the categories of Organs and Head. So too, the entity of skull may belong both to Head and Bones. Hereto, students may uncover a stronger belongingness or loading of skull to Head (after all, what is a head without the skull). The entity of tears is arguably a Bodily Fluid, but tears originate from the Face/Head. As such, this conflict (of finding the right home for a given body part) may help students to see the differences in factor loadings when entities belong to multiple categories.
- What is to be done with entities that struggle to find a suitable home among any of the identified categories? Consider hair, which may belong at first glance to the category of Head, but this may also include bodily hair not found on the head.

**Conclusion**

These two examples should help students vividly remember the mechanics and inner-workings of factor analysis. The probing questions should offer a lasting analysis that they may apply in later courses of theory, research, and statistical methods.

**References**

Connor, J. (2003). Making statistics come alive: Using space and students’ bodies to illustrate statistical concepts. *Teaching of Psychology, 30,* 141.

Cramer, K. M., Ofosu, H. B., & Barry, J. E. (2000). An abbreviated form of the Social and Emotional Loneliness Scale for Adults (SELSA). *Personality and Individual Differences, 28,* 1125-1131.

McCrae, R. R., & Costa, P. T., Jr. (1996). Toward a new generation of personality theories: Theoretical contexts for the five-factor model. In J. S. Wiggins (Ed.), *The five-factor model of personality: Theoretical perspectives* (pp. 51-87). New York: Guilford.

Segrist, D. J., Pawlow, L. A. (2007). The mixer: Introducing the concept of factor analysis. *Teaching of Psychology, 34,* 121-123.

Tabachnik, B. G., & Fidell, L. (2013). *Using multivariate statistics. (6 ^{th} ed.).* Toronto: Pearson

Wechsler, D. (2008). *Wechsler Intelligence Scale for Adults – Fourth Edition.* San Antonio, TX: Pearson.

Yu, C. H., Andrews, S., Winogard, D., Jannasch-Pennell, A., & DiGangi, S. A. (2002). Teaching factor analysis in terms of variable space and subject space using multimedia visualization. *Journal of Statistics Education, 10.*

**Appendix-A: Stew Ingredients**

Meats (beef, chicken, lamb, pork); Vegetables (potatoes, carrots, onions, celery, mushrooms); Liquids (water, tomato sauce, tomato paste, soy sauce); Spices (salt, pepper, garlic, oregano, sage, thyme)

**Appendix-B: Parts of the Body**

Face (eyes, nose, mouth, ears, tongue, chin, cheek, hair), Organs (lungs, heart, kidney, liver, brain, spleen, skin); Bodily Fluids (blood, urine, pus, tears, bile, phlegm); Bones (femur, tibia, ulna, skull, ribs, radius)