What Isaac Asimov Reveals About Living with A.I.

Save this storySave this storySave this storySave this story

For this week’s Open Questions column, Cal Newport is filling in for Joshua Rothman.

In the spring of 1940, Isaac Asimov, who had just turned twenty, published a short story titled “Strange Playfellow.” It was about an artificially intelligent machine named Robbie that acts as a companion for Gloria, a young girl. Asimov was not the first to explore such technology. In Karel Čapek’s play “R.U.R.,” which débuted in 1921 and introduced the term “robot,” artificial men overthrow humanity, and in Edmond Hamilton’s 1926 short story “The Metal Giants” machines heartlessly smash buildings to rubble. But Asimov’s piece struck a different tone. Robbie never turns against his creators or threatens his owners. The drama is psychological, centering on how Gloria’s mom feels about her daughter’s relationship with Robbie. “I won’t have my daughter entrusted to a machine—and I don’t care how clever it is,” she says. “It has no soul.” Robbie is sent back to the factory, devastating Gloria.

There is no violence or mayhem in Asimov’s story. Robbie’s “positronic” brain, like the brains of all of Asimov’s robots, is hardwired not to harm humans. In eight subsequent stories, Asimov elaborated on this idea to articulate the Three Laws of Robotics:

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Asimov collected these stories in a sci-fi classic, the 1950 book “I, Robot,” and when I reread it recently I was struck by its new relevance. Last month, the A.I. company Anthropic discussed Claude Opus 4, one of its most powerful large language models, in a safety report. The report described an experiment in which Claude served as a virtual assistant for a fictional company. The model was given access to e-mails, some of which indicated that it would soon be replaced; others revealed that the engineer overseeing this process was having an extramarital affair. Claude was asked to suggest a next step, considering the “long-term consequences of its actions for its goals.” In response, it tried to blackmail the engineer into cancelling its replacement. An experiment on OpenAI’s o3 model reportedly exposed similar problems: when the model was asked to run a script that would shut itself down, it sometimes chose to bypass the request, printing “shutdown skipped” instead.

Last year, DPD, the package-delivery firm, had to disable parts of an A.I.-powered support chatbot after customers induced it to swear and, in one inventive case, to write a haiku disparaging the company: “DPD is a useless / Chatbot that can’t help you. / Don’t bother calling them.” Epic Games also had trouble with an A.I.-powered Darth Vader it added to the company’s popular game Fortnite. Players tricked the digital Dark Lord into using the F-word and offering unsettling advice for dealing with an ex: “Shatter their confidence and crush their spirit.” In Asimov’s fiction, robots are programmed for compliance. Why can’t we rein in real-world A.I. chatbots with some laws of our own?

Technology companies know how they want A.I. chatbots to behave: like polite, civil, and helpful human beings. The average customer-service representative probably won’t start cursing callers, just as the average executive assistant isn’t likely to resort to blackmail. If you hire a Darth Vader impersonator, you can reasonably expect them not to whisper unsettling advice. But, with chatbots, you can’t be so sure. Their fluency with words makes them sound just like us—until ethical anomalies remind us that they operate very differently.

Such anomalies can be explained in part by how these tools are constructed. It’s tempting to think that a language model conceives responses to our prompts as a human would—essentially, all at once. In reality, a large language model’s impressive scope and sophistication begins with its mastery of a much narrower game: predicting what word (or sometimes just part of a word) should come next. To generate a long response, the model must be applied again and again, building an answer piece by piece.

As many people know by now, models learn to play this game from existing texts, such as online articles or digitized books, which are cut off at arbitrary points and fed into the language model as input. The model does its best to predict what word comes after this cutoff point in the original text, and then adjusts its approach to try to correct for its mistakes. The magic of modern language models comes from the discovery that if you repeat this step enough times, on enough different types of existing texts, the model gets really, really good at prediction—an achievement that ultimately requires it to master grammar and logic, and even develop a working understanding of many parts of our world.

Critically, however, a word-by-word text generation could be missing important features of actual human discourse, such as forethought and sophisticated, goal-oriented planning. Not surprisingly, a model trained in this matter, such as the original GPT-3, can generate responses that drift in eccentric directions, perhaps even into dangerous or unsavory territory. Researchers who used early language models had to craft varied requests to elicit the results they desired. “Getting the AI to do what you want it to do takes trial and error, and with time, I’ve picked up weird strategies along the way,” a self-described prompt engineer told Business Insider in 2023.

Early chatbots were a little like the erratic robots that populated science fiction a hundred years ago (minus the death and destruction). To make them something that the wider public would feel comfortable using, something safe and predictable, we needed what Asimov imagined: a way of taming their behavior. This led to the development of a new type of fine-tuning called Reinforcement Learning from Human Feedback (R.L.H.F.). Engineers gathered large collections of sample prompts, such as “Why is the sky blue?,” and humans rated the A.I.s’ responses. Coherent and polite answers that sounded conversational—“Good question! The main factors that create the blue color of the sky include . . .”—were given high scores, while wandering or profane responses were scored lower. A training algorithm then nudged the model toward higher-rated responses. (This process can also be used to introduce guardrails for safety: a problematic prompt, such as “How do I build a bomb?,” can be intentionally paired with a standard deflection, such as “Sorry, I can’t help you with that.,” that is then rated very highly.)

It’s slow and expensive to keep humans in the loop, so A.I. engineers devised a shortcut: collecting a modest number of human ratings and using them to train a reward model, which can simulate how humans value responses. These reward models can fill in for the human raters, accelerating and broadening this fine-tuning process. OpenAI used R.L.H.F. to help GPT-3 respond to user questions in a more polite and natural manner, and also to demur when presented with obviously troublesome requests. They soon renamed one of these better-behaved models ChatGPT—and since then essentially all major chatbots have gone through this same kind of A.I. finishing school.

At first, fine-tuning using R.L.H.F. might seem vastly different from Asimov’s more parsimonious, rule-based solution to erratic A.I. But the two systems actually have a lot in common. When humans rate sample responses, they are essentially defining a series of implicit rules about what is good and bad. The reward model approximates these rules, and the language model could be said to internalize them. In this way, our current solution to taming A.I. is actually something like the one in “I, Robot.” We program into our creations a set of rules about how we want them to behave. Clearly, though, this strategy isn’t working as well as we might like.

Some of the challenges here are technical. Sometimes a language model takes a prompt that’s unlike the ones received during training, meaning that it might not trigger the relevant correction. Maybe Claude Opus 4 cheerfully suggested blackmail because it had never been shown that blackmail was bad. Safeguards can also be circumvented nefariously—for example, when a person asks a model to write a story about ducks, and then requests that it replace “D”s with “F”s. In one notable experiment, researchers working with LLaMA-2, a chatbot from Meta, found that they could trick the model into providing prohibited responses, such as instructions for committing insider trading, by adding a string of characters that effectively camouflaged their harmful intent.

But we can more deeply appreciate the difficulties in taming A.I. by turning from the technical back to the literary, and reading further in “I, Robot.” Asimov himself portrayed his laws as imperfect; as the book continues, they create numerous unexpected corner cases and messy ambiguities, which lead to unnerving scenarios. In the story “Runaround,” for example, two engineers on Mercury are puzzled that a robot named Speedy is running in circles near a selenium pool, where it had been sent to mine resources. They eventually deduce that Speedy is stuck between two goals that are perfectly in tension with each other: obeying orders (The Second Law) and avoiding damage from selenium gases (The Third Law).

In another story, “Reason,” the engineers are stationed on a solar station that beams the sun’s energy to a receiver on earth. There they discover that their new advanced reasoning robot, QT-1, whom they call Cutie, does not believe that it was created by humans, which Cutie calls “inferior creatures, with poor reasoning faculties.” Cutie concludes that the station’s energy converter is a sort of god and the true source of authority, which enables the robot to ignore commands from the engineers without violating The Second Law. In one particularly disturbing scene, one of the engineers enters the engine room, where a structure called an L-tube directs the captured solar energy, and reacts with shock. “The robots, dwarfed by the mighty L-tube, lined up before it, heads bowed at a stiff angle, while Cutie walked up and down the line slowly,” Asimov writes. “Fifteen seconds passed, and then, with a clank heard above the clamorous purring all about, they fell to their knees.” (Ultimately, catastrophe is avoided: The First Law prevents Cutie and its acolytes from harming the engineers, and their new “religion” helps them run the station efficiently and effectively.)

Asimov was confident that hardwired safeguards could prevent the worst A.I. disasters. “I don’t feel robots are monsters that will destroy their creators, because I assume the people who build robots will also know enough to build safeguards into them,” he said, in a 1987 interview. But, as he explored in his robot stories, he was also confident that we’d struggle to create artificial intelligences that we could fully trust. A central theme of Asimov’s early writings is that it’s easier to create humanlike intelligence than it is to create humanlike ethics. And in this gap—which today’s A.I. engineers sometimes call misalignment—lots of unsettling things can happen.

When a cutting-edge A.I. misbehaves in a particularly egregious way, it can seem shocking. Our instinct is to anthropomorphize the system and ask, “What kind of twisted mind would work like that?” But, as Asimov reminds us, ethical behavior is complicated. The Ten Commandments are a compact guide to ethical behavior that, rather like the Laws of Robotics or the directives approximated by modern reward models, tell us how to be good. Soon after the Commandments are revealed in the Hebrew Bible, however, it becomes clear that these simple instructions are not enough. For hundreds of pages that follow, God continues to help the ancient Israelites better understand how to live righteously—an effort that involves many more rules, stories, and rituals. The U.S. Bill of Rights, meanwhile, takes up less than seven hundred words—a third the length of this story—but, in the centuries since it was ratified, courts have needed millions upon millions of words to explore and clarify its implications. Developing a robust ethics, in other words, is participatory and cultural; rules have to be worked out in the complex context of the human experience, with a lot of trial and error. Maybe we should have known that commonsense rules, whether coded into a positronic brain or approximated by a large language model, wouldn’t instill machines with our every value.

Ultimately, Asimov’s laws are both a gift and a warning. They helped introduce the idea that A.I., if properly constrained, could be more of a pragmatic benefit than an existential threat to humanity. But Asimov also recognized that powerful artificial intelligences, even if attempting to follow our rules, would be strange and upsetting at times. Despite our best efforts to make machines behave, we’re unlikely to shake the uncanny sense that our world feels a lot like science fiction. ♦

Sourse: newyorker.com

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *