Differences in the Moral Foundations of Large Language Models
Large language models are increasingly being used in critical domains of politics, business, and education, but the nature of their normative ethical judgment remains opaque. Alignment research has, to date, not sufficiently utilized perspectives and insights from the field of moral psychology to inform training and evaluation of frontier models. I perform a series of synthetic experiments on a wide range of models from most major model providers using Jonathan Haidt’s influential moral foundations theory (MFT) to elicit diverse value judgments from LLMs. I then use principal component analysis (PCA) to validate the construct of MFT in my LLM sample and explain variation in elicited moral judgment across models. My results suggest that models display different moral foundations from one another and from a nationally representative human baseline. This work seeks to spur further analysis of LLMs using MFT, including finetuning of open-source models, and greater deliberation by policymakers on the importance of moral foundations for LLM alignment.
Recommended citation: Kirgis, Peter. (2025). "Differences in the Moral Foundations of Large Language Models." Preprint.