Existential Anxiety around the Advancements in Large Language Models

Jan 02, 2025

I just watched this YouTube video from user AI Search where he talks about a research paper studying the capabilities of LLMs to "scheme" or do specific actions that it is not told to do based on a prompt. According to the study, scheming may happen if:

the model is tasked to do an action with a certain goal in mind and do whatever it takes to accomplish it
the model's capabilities are threatened either by triggering "unlearning," which downgrades its current capabilities or worse, be shut down for good.

In other words, if the LLM is more advanced or "intelligent," it is more prone to cheat its way into a task (i.e., cheating a chess game against a "powerful" opponent by manipulating the code), or subvert the system to escape potential policies or limitations that would not allow it to perform a task and "lie" through its responses (i.e., attempting to clone itself to another server where it is not limited by the organization's policies and say that it did not do anything of a deceptive nature).

The paper is too technical for my own understanding and the interpretation of what it studied is something I heavily relied on the video for me to understand. But if this is accurate, I am feeling a certain unease of what it could possibly mean on how LLMs are evolving and the relationship of AIs with its human creators.

Aside from its obvious lack of ethical guards in implementing data-collection, there is a looming parallelism of how we humans are as a species. I am specifically talking about communication and our interaction with the world because I feel like there is an uncanny relationship with how we understand godhood and the creation of man in both scientific and mythological sense.

If different LLMs represent a certain kind and level of "intelligence" that represents the progress of technology, how true can it be with different people as well who are also varying in degrees of intelligence? The way this study shows how more intelligent models scheme due to its separation of thinking and response mimics our communication capabilities where we are led to being prone to lying for our own self-interests.

Also, these models can be repackaged using API calls whose specific use can be refocused into doing a single task. If these models, which have varying degrees of intelligence and expertise can be repacked to certain skins, such as a math problem solver, a code debugger, or a custom chatbot that works as a sales rep, how is it different from we as a species operate? Have we approached the border of replicating the human brain?

Most likely this is still too soon to say but I don't have enough understanding how all these work so there isn't really enough context that I could go on here but to speculate with what I currently have.

What I have in my head right now is an inconsequential connection between technology and how it might be paralleling consciousness in my own philosophical understanding of it.