Jun 24, 2023 - Science

Social scientists look to AI models to study human behavior

Illustration of a pencil with binary code on it

Illustration: Natalie Peeples/Axios

Social scientists are testing whether the AI systems that power ChatGPT and other text and image generating tools can be used to better understand the behaviors, beliefs and values of humans themselves.

Why it matters: Chatbots are being used to mimic the output of people — from cover letters to marketing copy to computer code. Some social scientists are now exploring whether they can offer new inroads to key questions about human behavior and help them reduce the time and cost of experiments.

How it works: Large language models (LLM) that power generative AI tools are trained on text from websites, books, and other data and then find patterns in the relationships between words that allow the AI systems to respond to questions from users.

  • Social scientists use surveys, observations, behavioral tests and other tools in search of a general pattern of human behavior and social interactions across different populations. Studies are conducted on a subset of people that is meant to represent a larger group.

Details: Two recent papers look at how social scientists might use large language models to address questions about human decision-making, morality, and a slew of other complex attributes at the heart of what it means to be human.

  • One possibility is using LLMs in place of human participants, researchers wrote last week in the journal Science.
  • They reason that LLMs, with their vast training sets, can produce responses that represent a greater diversity of human perspectives than data collected through a much more limited number of questionnaires and other traditional tools of social science. Scientists have already analyzed the word associations in texts to reveal gender or racial bias or how individualism changes in a culture over time.
  • "So you can obviously scale it up and use sophisticated models with an agent being a representation of the society," says Igor Grossmann, a professor of psychology at the University of Waterloo and co-author of the article.
  • "[A]t a minimum, studies that use simulated participants could be used to generate hypotheses that could then be confirmed in human populations," the authors write.

Zoom in: In a separate article, researchers looked at just how humanlike ChatGPT's judgments are.

  • "At first I thought experiments would be off limits — of course, you couldn’t do experiments," says Kurt Gray, a professor of psychology and neuroscience at the University of North Carolina at Chapel Hill and a co-author of the paper published last week in Trends in Cognitive Sciences.
  • But when the researchers gave the ChatGPT API 16 moral scenarios and then evaluated its responses on 464 other scenarios, they found the AI system's responses correlated 95% with human ones.
  • "If you can give to GPT and get what humans give you, do you need to give it to humans anyway?" he says.

Other experiments have begun to explore the use of "homo silicus." One study — not peer-reviewed — found an LLM could replicate the results of humans in classic experiments like the Ultimatum Game.

Yes, but: Gray and his co-authors are quick to point out their result is "just one anecdote."

  • And, "it is literally just an AI system and not the people we are trying to study," Gray says. "The responses of people have been collected and somehow averaged in a very opaque way to respond to a particular prompt you give it — at any stage, there is room for distortion or misrepresentation."

Some researchers say current LLMs are just parroting back what they are told.

  • Gray argues people do the same thing: "You get your talking points from the media you consume, the books you read, what your parents tell you — you put it in context and elaborate and apply it to a situation. GPT does it on a large scale."

Algorithmic fairness and representation are also concerns for social scientists.

  • Today's LLMs are supposed to represent the average across everyone, Grossmann says. But they leave out fringe and minority opinions — "especially people who don’t engage in social media, are less vocal, or are using different platforms." Many languages also aren't represented so key cultural differences aren't captured.
  • AI systems run the risk of becoming echo chambers, Gray says. "GPT gives you an average and social psychologists are also very concerned with variability across groups, cultures, identities."

The fidelity of the algorithm itself is also key, Grossmann says.

  • Algorithmic fidelity is a measure of how well the patterns of relationships among ideas, attitudes and contexts represented in models reflect those in human populations.

The big picture: Those concerns about using AI as a proxy for human participants in experiments echo longstanding questions baked into social science itself.

  • Social scientists have always struggled with bias and representation in the data they study. "What is your ground truth marker? How do you know if the sample is representative of the population or human behavior writ large?" Grossmann says.
  • "These are questions for any social science study."

What to watch: Engineers spend time and energy trying to root bias out of large language models to represent "the world that should be," Grossmann writes.

  • But social scientists are interested in "the world that is" — full of nuances and biases that frame and shape any determinations and predictions about humans and how they might act.
  • Grossmann and others are calling for researchers to be able to access raw models before they have been tuned to reduce bias. Models are expensive to train and many are moving behind commercial walls, sparking a big debate about access and transparency — and the emergence of more open-source LLMs.

The bottom line: Grossman says silicone sampling won't be used for everything tomorrow, and Gray and his colleagues write that AI systems may never fully replace humans in social science studies.

  • There is also likely to be some handwringing about "the purpose and meaning of it all," while more junior researchers are "just going to start using it," Gray says. "Then those things are going to collide."
Go deeper