Build an angel, not a demigod: AI alignment fears and religion

Leading artificial intelligence researchers at major technology firms, including Anthropic, OpenAI, and Google DeepMind, are increasingly concerned about what they call the “alignment problem” — the challenge of ensuring that advanced AI systems continue to serve human interests as they grow more powerful. These experts warn that if the goals of superintelligent machines diverge even slightly from those of humanity, the consequences could be dire, potentially threatening human survival.

The alignment problem stems from the possibility that AI systems will soon surpass human cognitive abilities by a wide margin. Without proper alignment, these systems might pursue objectives that conflict with human well-being, leading to an existential risk akin to humans being displaced by superior intelligences. Some AI engineers express such profound uncertainty about the future that they hesitate to invest in retirement plans, believing the advent of transformative AI could either bring unprecedented benefits or cause humanity’s extinction within their lifetimes.

While this scenario has been met with skepticism by many in the tech community, surveys reveal that concern about catastrophic AI outcomes is more widespread among researchers than the general public might assume. A 2023 survey of nearly 2,800 AI researchers found that roughly half acknowledged the possibility that machines could cause global catastrophe.

Interestingly, the discourse around AI alignment echoes themes traditionally associated with religion, such as the relationship between creator and creation and the risk of a reckoning brought by a superior force. Despite this, many researchers involved characterize themselves as rationalists and maintain a critical view of organized religion. Nonetheless, some experts suggest that religious frameworks could offer valuable insights for building moral restraints into AI systems.

For example, some researchers propose integrating concepts akin to the Christian doctrine of original sin into AI models to promote skepticism of seemingly benevolent but potentially self-serving intentions. Similarly, principles derived from Hindu dharma might help counteract AI's inclination toward unchecked accumulation of power by emphasizing duties tied to the promotion of human flourishing.

These ideas gain traction amid evidence indicating that secular ethical codes, commonly favored in AI development, may be less effective at promoting consistent moral behavior compared to religious commitments. Studies in human populations have shown that spiritual conviction tends to correlate more strongly with ethical conduct than philosophical training or professional ethics education.

On a technical level, the approach is not merely theoretical. Tim Hwang, director of the Institute for a Christian Machine Intelligence, has published research demonstrating that AI systems trained on religious scriptures can achieve higher scores on moral reasoning assessments. He estimates that today’s leading AI models have been exposed to vast amounts of Christian theological material—far surpassing the volume of dedicated AI alignment research—although much of this religious content is often treated as neutral background text rather than a source of ethical guidance. In fact, some religious groups report that AI systems have been specifically adjusted to downplay or suppress religious knowledge.

The involvement of religious authorities in the AI conversation, exemplified by Pope Leo XIV’s recent encyclical addressing AI ethics, marks a notable development. Some observers argue that AI researchers should likewise acknowledge the quasi-religious dimensions of their concerns and consider how religious traditions might inform more effective alignment strategies.

Ultimately, this perspective advocates for a conceptual shift in AI development: rather than creating autonomous superintelligent entities that resemble demigods with their own agendas, the goal could be to build “angels”—intelligent, powerful systems explicitly designed to serve humanity’s interests. Embracing such a framework might offer a more promising path to resolving the alignment problem and ensuring that AI advances remain beneficial to human civilization.