AI Ethics – What is AI Alignment?

AI Ethics – What is AI Alignment?

I decided to start a small article series about some topics around AI ethics. The Asilomar AI Principles from the Future of Life Institute (FLI) inspired me to do this. These principles were designed by many AI researchers and aim to pave our way towards beneficial AI. The section “Ethics and Values” comprises 13 principles which I personally find very interesting. This first post is about principle #10 AI Alignment, sometimes also Value Alignment. One of the most active researchers in this field is Eliezer Yudkowsky, so most of the content from this article is based on his ideas.


“Highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation.”

The FLI defines AI Alignment as goals and behavior of an AI that is aligned with human values. This problem is explained easily using an example. Let’s think of an overly simplified AI that drives your car. A part of its utility function might look like this:

Car driving AI utility function
The utility function for an overly simplified car driving AI.

The AI aims to maximize its utility (reward) by avoiding the negative utilities and following the positives. However, as a passenger driving in this autonomous car, we might have different utilities not covered by the car AI:

Human Utility Function
This might be a part of a humans’ utility function.

The divergence in this example comes from different values for some outcomes and from additional factors that only we humans have in mind. For instance, the penalty for the death of a passenger might not be high enough. In particular cases, the car AI might “consciously” prefer the death of an passenger against any other option. Moreover, we must ensure that the AI takes factors like fun, comfort, route, and more into account. Otherwise, the car will drive us to our destination on the smartest route but it will not care if we feel well or safe. Even if small details are missing, we could end up with a misaligned AI. Most of the time, it is hard to detect this misalignment in early stages.

Why is it important?

You might think that the people who invent an AI might be able to just apply patches to the utility function whenever they like. Well, they could do this in the beginning to prevent the AI from pursuing unwanted behavior. Think of an AI that tries to make humans happy. Once it finds out that opioid drugs (like heroin) make people happy, it will just start medicating everybody. This behavior is fixed by using high penalties on drugs. This way of handling AI is called Nearest Unblocked Strategy. This works until your AI becomes too smart. Often we call this point the “Singularity”. As with many other fields of AI research, this is an issue that must to be solved before we invent AGI.

Status Quo?

Many researchers work on AI alignment approaches from different perspectives. Here, I only list a fraction of them:

  • Utility Indifference
    • Approach to react on misalignment by switching to an alternative utility function
  • Low-Impact Agents
    • Aims to build an AI with the minimal footprint necessary to fulfill a task
  • Soft Optimization
    • Find a way of telling an AI “Optimize your utility function until you reach your goal to a large extend. Imperfection is fine!”

Interestingly, many researchers seem to have realized that the optimal AI is not necessarily the best for humanity. Therefore, some approaches aim to consciously make AI less impactful in our world. After all, we must keep in mind that the outcome of our AI efforts needs to be beneficial for humanity.


The AI alignment research is centered around aligning intelligent agents with human values. Technically speaking, this refers to the alignment of utility functions. Even small misalignments become huge problems due to the amplifying effect during intelligence explosion. The alignment of AI with human values is not simply a safeguard against “evil AI”. It is also the only path to intelligence that solves problems the way we need it.