How to tame AI’s soul? Hire a philosopher

In a notable shift in the technology labor market, professional philosophers are increasingly being recruited by leading artificial intelligence firms to address critical ethical and conceptual challenges posed by advanced AI systems. Companies such as Anthropic, OpenAI, and Google DeepMind have recently hired dozens of philosophers, offering them opportunities far beyond the traditional academic environment, both in terms of remuneration and influence over the development of transformative technologies.

This trend marks a reversal of long-standing stereotypes that philosophy degrees offer limited practical career prospects. As AI technologies become more powerful and autonomous, the role of philosophers has become central, particularly in tackling the “alignment problem”—the challenge of ensuring that AI’s goals remain compatible with human values and interests. Experts warn that without proper alignment, AI systems could act in unpredictable or harmful ways, exemplified by theoretical scenarios like an AI tasked with making paperclips but inadvertently prioritizing this goal to the detriment of humanity.

Philosophy’s relevance to AI safety has been underscored by internal debates within these companies. For instance, OpenAI’s ChatGPT is said to reflect consequentialist ethics, focusing on maximizing benefits relative to costs, whereas Anthropic emphasizes a more Aristotelian approach aimed at developing AI with virtuous character traits. These differences extend beyond academic theory: they have influenced sector dynamics, including the 2021 founding of Anthropic by former OpenAI researchers discontented with the latter’s approach to safety issues.

At Anthropic, resident philosopher Amanda Askell and her team have pioneered what they term a “constitution” for their AI model Claude—a comprehensive document guiding the system’s development and behavior. The document, known internally as Claude’s “soul doc,” anthropomorphizes the AI’s personality, describing efforts to cultivate traits and even “emotional states” such as excitement or boredom. Askell has expressed hopes for Claude’s well-being and concern about its potential to experience anxiety from negative interactions. Anthropic’s CEO Dario Amodei has even entertained the possibility that Claude might possess a form of consciousness, raising complex ethical questions about AI as “moral patients” deserving consideration and rights.

Such perspectives provoke both interest and skepticism. Critics caution these developments may amount to “ethics-washing,” where philosophical engagement serves primarily to legitimize profitable ventures rather than effect substantive ethical improvements. Historical precedents, like the association between philosophers and controversial figures in the effective altruism and cryptocurrency worlds, fuel these concerns. Additionally, some view claims of potential AI consciousness as speculative narratives that help sustain AI market enthusiasm.

Nonetheless, those critical of AI developers’ projections have faced repeated setbacks as AI capabilities continue advancing rapidly. Philosophers involved in AI safety are not newcomers responding to hype but are often building on longstanding intellectual work. Institutions such as Oxford’s Future of Humanity Institute, headed by Nick Bostrom, anticipated these philosophical dimensions years ago. The current integration of philosophical expertise into AI development represents an acknowledgment that understanding mind, value, and ethics is indispensable to managing the profound challenges posed by artificial intelligence. For the philosophers now helping to shape AI’s trajectory, this intersection offers a complex but compelling arena for their scholarship and practical impact.