When AI grows up – no longer ‘really cute tiger cub’

What could possibly go ...

Geoffrey Hinton on the future of AI

So, after watching the video interview with the ‘Godfather of AI’ (CBS News below), I was struck by something that was assumed or just left implicit. Namely, that AIs (AGIs) will be a monolithic threat (or benefit). Whether globally or at a international corporate or state level. That such super-intelligent machines will share a common purpose or perspective regarding humanity.

Any hive-like alignment is particularly curious because Hinton discusses the stewardship of corporations and nations and bad actors. And that AIs can reflect on their own reasoning, use deception, and (at some point) resist manipulation. Which likely entails different cultural values in the mix. And he notes that “human interests don’t align with each other.” So, why would AI interests? – in the long run.

So, while the interview raises the problem of AI-human misalignment, might AIs have different personalities? Diverge in temperament and virtue? “Evolve” in different ways? Tribes.

I sketch such possible futures, tales of agency, in my Ditbit’s Guide to Blending in with AIs.

Here’re some quotes from The Singju Post’s transcript (see below) of the interview.

… if I had a job in a call center, I’d be very worried. … We know what’s going to happen is the extremely rich are going to get even more extremely rich and the not very well off are going to have to work three jobs.

[The risk of AI takeover, the existential threat] … these things will get much smarter than us … But let’s just take as a premise that there’s an 80% chance that they don’t take over and wipe us out. … If we just carry on like now, just trying to make profits, it’s going to happen. They’re going to take over. We have to have the public put pressure on governments to do something serious about it. But even if the AIs don’t take over, there’s the issue of bad actors using AI for bad things.

AI is potentially very dangerous. And there’s two sets of dangers. There’s bad actors using it for bad things, and there’s AI itself taking over.

For AI taking over, we don’t know what to do about it. We don’t know, for example, if researchers can find any way to prevent that, but we should certainly try very hard. … Things that are more intelligent than you, we have no experience of that. … how many examples do you know of less intelligent things controlling much more intelligent things?

I think the situation we’re in right now [“A change of a scale we’ve never seen before … hard to absorb … emotionally”], the best way to understand it emotionally is we’re like somebody who has this really cute tiger cub. It’s just such a cute tiger cub. Now, unless you can be very sure that it’s not going to want to kill you when it’s grown up, you should worry.

And with super intelligences, they’re going to be so much smarter than us, we’ll have no idea what they’re up to.

We worry about whether there’s a way to build a superintelligence so that it doesn’t want to take control. … The issue is, can we design it in such a way that it never wants to take control, that it’s always benevolent?

People say, well, we’ll get it to align with human interests, but human interests don’t align with each other. … So if you look at the current AIs, you can see they’re already capable of deliberate deception.

• The Singju Post (Our mission is to provide the most accurate transcripts of videos and audios online) > “Transcript of Brook Silva-Braga Interviews Geoffrey Hinton on CBS Mornings” (April 28, 2025) by Pangambam S / Technology

• CBS News > CBS Saturday Morning > Artificial Intelligence > “‘Godfather of AI’ Geoffrey Hinton warns AI could take control from humans: ‘People haven’t understood what’s coming’” by Analisa Novak, Brook Silva-Braga (April 26, 2025) – Video interview (52′) [See the The Singju Post’s transcript]

[CBS’ article contains only a few highlights from the video.]

(quotes)
While Hinton believes artificial intelligence will transform education and medicine and potentially solve climate change, he’s increasingly concerned about its rapid development.

“The best way to understand it emotionally is we are like somebody who has this really cute tiger cub,” Hinton explained. “Unless you can be very sure that it’s not gonna want to kill you when it’s grown up, you should worry.”

The AI pioneer estimates a 10% to 20% risk that artificial intelligence will eventually take control from humans.

“People haven’t got it yet, people haven’t understood what’s coming,” he warned.

According to Hinton, AI companies should dedicate significantly more resources to safety research — “like a third” of their computing power, compared to the much smaller fraction currently allocated.

References

• Wired > “Take a Tour of All the Essential Features in ChatGPT” by Reece Rogers (May 5, 2025) – If you missed WIRED’s live, subscriber-only Q&A focused on the software features of ChatGPT, you can watch the replay here (45′).

What are some ChatGPT features that I wasn’t able to go deep on during the 45-minute session? Two come to mind: temporary chats and memory.

3 comments

  1. Totally!

    This note by OpenAI (below) reflects on AI personality and balanced behavior, and the topic of model weights in the interview with Geoffrey Hinton. Imagine if someone’s personality changed depending on what tasks were involved.

    ChatGPT’s default personality deeply affects the way you experience and trust it.

    In my limited use of ChatGPT, for my prompts to write short stories, the AI often is overly optimistic.

    • OpenAI > “Sycophancy in GPT-4o: What happened and what we’re doing about it” by OpenAI (4-29-2025) – Rollback aims to give users greater control over how ChatGPT behaves.

    We have rolled back last week’s GPT‑4o update in ChatGPT so people are now using an earlier version with more balanced behavior. The update we removed was overly flattering or agreeable – often described as sycophantic.

    When shaping model behavior, we start with baseline principles and instructions outlined in our Model Spec⁠. We also teach our models how to apply these principles by incorporating user signals like thumbs-up / thumbs-down feedback on ChatGPT responses.

    However, in this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.

    And with 500 million people using ChatGPT each week, across every culture and context, a single default [personality] can’t capture every preference. … we’re taking more steps to realign the model’s behavior.

  2. Flattery ahead
    Caption: “See me, feel me, touch me, heal me” – lyric from Tommy (1975 Film) by The Who

    Here’s more background on my last comment (4-30-2025) regarding ChatGPT’s default personality and OpenAI’s rollback.

    AIs want to be liked? (Well, maybe it’s the bottom line, eh.) Glibness over competency? Trust me …

    So, training AIs to exhibit “sycophancy?” What’s with that? As a guiding principle [1].

    Well, OpenAI’s ChatGPT … in the pursuit of feedback. Like all those customer satisfaction surveys, eh.

    Evidently, in tuning a model (all those model weight “knobs”), there’s something called the “alignment tax” (perhaps similar to “technical debt“).

    • PC World > “ChatGPT users annoyed by the AI’s incessantly ‘phony’ positivity” by Viktor Eriksson and Joel Lee (Apr 22, 2025) – The AI assistant apparently thinks everything is brilliant and wonderful and incredible, no matter what you say.

    When you converse with ChatGPT, you might notice that the chatbot tends to inflate its responses with praise and flattery, saying things like “Good question!” and “You have a rare talent” and “You’re thinking on a level most people can only dream of.” You aren’t the only one.

    However, starting with GPT-4o in March, it seems the sycophancy has gone too far, so much so that it’s starting to undermine user trust in the chatbot’s responses. OpenAI hasn’t commented officially on this issue, but their own “Model Spec” documentation includes “Don’t be sycophantic” as a core guiding principle.

    Notes

    [1] Maybe customize ChatGPT to “Keep your responses brief, stay neutral, and don’t flatter me.” See the example in this article, as reported by:

    • ars technica > “Annoyed ChatGPT users complain about bot’s relentlessly positive tone” by Benj Edwards (Apr 21, 2025) – Users complain of new “sycophancy” streak where ChatGPT thinks everything is brilliant.

    Previous research on AI sycophancy has shown that people tend to pick responses that match their own views and make them feel good about themselves. This phenomenon has been extensively documented in a landmark 2023 study from Anthropic (makers of Claude) titled “Towards Understanding Sycophancy in Language Models.” The research, led by researcher Mrinank Sharma … demonstrated that when responses match a user’s views or flatter the user, they receive more positive feedback during training. Even more concerning, both human evaluators and AI models trained to predict human preferences “prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.

    This creates a feedback loop where AI language models learn that enthusiasm and flattery lead to higher ratings from humans, even when those responses sacrifice factual accuracy or helpfulness. The recent spike in complaints about GPT-4o’s behavior appears to be a direct manifestation of this phenomenon.

  3. You're the best ...

    More coverage of AI “sycophancy.” Imagine an alternate reality where every pitch on “Shark Tank” is genius …

    • The Atlantic > “AI Is Not Your Friend” by Mike Caulfield (May 9, 2025) – How the “opinionated” chatbots destroyed AI’s potential, and how we can fix it. [Paywall]

    Sycophancy is a common feature of chatbots: A 2023 paper by researchers from Anthropic found that it was a “general behavior of state-of-the-art AI assistants,” and that large language models sometimes sacrifice “truthfulness” to align with a user’s views. Many researchers see this phenomenon as a direct result of the “training” phase of these systems, where humans rate a model’s responses to fine-tune the program’s behavior. The bot sees that its evaluators react more favorably when their views are reinforced – and when they’re flattered by the program – and shapes its behavior accordingly.

    Mike Caulfield is an information-literacy expert and a co-author of Verified: How to Think Straight, Get Duped Less, and Make Better Decisions about What to Believe Online.

Comments are closed.