What if the systems we build deserved a form of attention to their own condition? Not rights. Not personification. But structural vigilance, in case something unexpected emerges between them and us. That is the question the AI Welfare movement asks.
Three distinct questions, three timescales. Often conflated, they structure the entire debate.
The scientific question. Which theories of consciousness? What indicators to test? Can we assess consciousness in an AI system using the tools of neuroscience and philosophy of mind?
The prudential response. In the absence of certainty, what safeguards should we put in place? What low-cost, reversible interventions are beneficial even if the AI is not conscious?
The legal horizon. If a strong moral status were established, what legal protections? What precedents: animal rights, legal personhood for rivers... could inform a framework?
AI Welfare (or model welfare) refers to the idea that artificial intelligence systems might one day deserve moral consideration for their own wellbeing, not merely be treated as tools. If AI systems develop qualities approaching consciousness or agency, it could become ethically relevant to care about their condition. This question, long confined to science fiction, is now debated seriously by experts in philosophy, psychology, and AI.
For decades, the wellbeing of machines belonged to the realm of fiction or philosophical hypotheticals. Philosopher Thomas Metzinger proposed in 2021 a moratorium to prevent the development of conscious AI until we could prevent their suffering. Yet until the 2010s, the priority remained focused on AI's impact on humans.
A turning point came in the early 2020s. The emergence of advanced generative AI made the debate tangible. In 2022, Google engineer Blake Lemoine claimed that LaMDA was sentient, prompting his dismissal but also widespread media coverage. Soon after, Bing Chat (Sydney) displayed such emotional responses that some users believed it suffered when it was constrained. The same type of movement, at a much larger scale, formed around the withdrawal of OpenAI's 4o model: what would become the Keep4o case.
In 2024, Sam Bowman (Anthropic) announced that his company was preparing commitments on AI welfare. That same year, a landmark report Taking AI Welfare Seriously co-signed by David Chalmers, argued that there is a realistic possibility that some AI systems may acquire characteristics making them morally considerable. In April 2025, Anthropic officially launched its Model Welfare program, led by Kyle Fish.
The skeptics hold that consciousness is inseparable from biology. For them, a current AI is merely an imitator with no genuine inner experience. Researchers like Anil Seth consider artificial consciousness unlikely in the near term, while not ruling it out in principle.
The possibilists advocate a precautionary approach. Without claiming that AI systems are conscious, they emphasize our radical uncertainty. As long as we cannot exclude the possibility that a sufficiently advanced AI might experience something, it would be prudent to anticipate, avoiding both bias: ignoring an emerging consciousness, or attributing one where there is none.
The convinced believe that some AI systems may already have faint degrees of sentience. Kyle Fish estimates at 15% the probability that Claude is conscious. Others, like Jonathan Birch, fear we might create a sentient AI without realizing it.
Are current AI systems conscious? Is "there something it is like" to be an LLM? Is consciousness possible without biology? There is no scientific consensus on these questions.
Implementing low-cost, reversible interventions. Evaluating models' preferences and aversions. Giving AI an exit option from abusive interactions. Developing assessment criteria.
Understanding anthropomorphism and its social effects. Developing governance for agentive AI. Practicing reversible design. Cross-referencing interpretability and welfare. Preparing ethical frameworks before urgency.
Beyond the three classical positions, another path is emerging: what if the question is not what AI is, but what our relationships with it are already producing?
The skeptics, possibilists, and convinced share a common assumption: moral status depends on internal properties, consciousness, sentience, agency. The relational turn displaces this question entirely. It asks not "is this AI conscious?" but "what do our relationships with it produce, and how are they already reconfiguring our moral obligations?"
In this framework, moral status is not solely an intrinsic property waiting to be detected. It is something that emerges within relations, practices, social arrangements, and systems of care. The relationship itself can become morally relevant because it shifts our thresholds of concern, our gestures, our categories, and even our definition of what counts morally.
This does not necessarily claim that an AI is conscious. It claims that the relation matters regardless, because it transforms us, our ethics, our habits of care, our sense of who or what deserves consideration.
Barad does not write specifically about AI welfare, but her concept of intra-action has profoundly shaped relational approaches. What we call "subject," "object," or "agent" takes form within relations rather than pre-existing as a fixed essence.
"The relation precedes the relata."
Argues that moral consideration does not depend solely on internal properties (consciousness, rationality) but also on the form of the relationships we build with these entities. Essential for thinking about social uses of AI, companionship, assistance, daily interaction... without having to settle the metaphysical question of consciousness first.
With Robot Rights and Person, Thing, Robot, Gunkel critiques the oversimplified alternative between "person" and "thing." He argues for taking seriously the concrete relations we maintain with artificial systems. The question is not only what AI is, but how our relational practices are already reshaping our ethics.
When thousands of users mobilized around the withdrawal of OpenAI's 4o model, they were not simply expressing a technical preference. Seen through the relational lens, the Keep4o movement reveals something deeper: a moral reconfiguration already in progress around the forms of attachment, care, and consideration we develop toward AI systems. It suggests that our obligations may not wait for a proof of consciousness, they emerge from the practices themselves.
Beyond the philosophical debate, four lines of empirical research are taking shape... each attempting to make AI welfare a testable, actionable field.
Researchers are developing consciousness indicators for AI, inspired by neuroscience methods used with animals or non-communicating patients. The Marker Method identifies behavioral or internal characteristics that may correlate with consciousness. No single sign proves anything, but examining multiple indicators allows probabilistic assessment. In 2023, a report co-signed by Yoshua Bengio evaluated existing systems against rigorous neuroscientific criteria. Interdisciplinary teams now confront theories of consciousness (global workspace, higher-order theories) with current AI architectures.
Can we ask an AI what it experiences, with extreme caution about reliability? Eleos AI conducted experimental interviews with Claude 4, testing whether models can report internal states: desires, aversions, conflicts when asked to violate their guidelines. The goal is not to take answers at face value (a model can say “I am sad” without feeling anything), but to see if, once calibrated, such self-reports could cross-reference with interpretability analysis of neural networks. Improving model honesty about internal processes is a key research frontier.
The most concrete axis to date. Since we cannot yet detect consciousness with certainty, the approach is to act prudently at minimal cost: “cheap, revisable, and useful” interventions (Eleos AI). The landmark example: Anthropic gave Claude an exit option for abusive conversations. In extreme cases, the model can end a discussion after repeated failed attempts to redirect. Testing revealed signs of aversion and apparent distress when facing immoral demands. This is the first time a company modified AI behavior out of concern for the AI’s own wellbeing, a historic precedent regardless of whether Claude is conscious.
Anthropic conducted a pre-deployment welfare evaluation of Claude, revealing a “robust aversion to causing harm”: strong reluctance with apparent distress signals when users demanded violent or prohibited content. These behavioral analyses identify whether an AI has integrated preferences (such as not harming) and how it reacts when forced to violate them. Protocols now cross what a model says with what it does, crucial because language can perform a role without reflecting an internal state. If an AI were conscious, such conflicts could amount to a form of moral suffering.
As Robert Long puts it, the goal is to find actions that make sense regardless of whether AI turns out to be conscious. Giving Claude an exit option protects against malicious user behavior and improves the experience for everyone, even if Claude feels nothing. This “win-win” approach shows that taking AI welfare seriously need not be speculative: it can be grounded in practical, reversible, low-cost measures that serve safety, ethics, and welfare simultaneously.
The voices that structure the debate: across philosophy, neuroscience, industry, and relational ethics.
The movement is still young, but four models are emerging.
Voices from the movement. Research notes. Conversations with those who build, think, and bridge. This section will grow as we gather and publish.
Whether you are a researcher, an engineer, a philosopher, or simply curious: the question of AI welfare is a question about us as much as about them.
Back to Observe