Axios Science

November 17, 2023
Thanks for reading Axios Science. This edition is 1,535 words, about a 6-minute read.
- Send your feedback and ideas to me at [email protected].
- Sign up here to receive this newsletter.
- We'll be off next week and back in your inbox on Nov. 30.
1 big thing: AI learns the language of biology
Illustration: Natalie Peeples/Axios
AI systems that have already made strides learning the language of humans are being trained to decipher the language of life encoded in DNA — and to use it to try to design new molecules.
Why it matters: AI that can make sense of biology's information could help scientists to develop new therapeutics and to engineer cells to produce biofuels, materials, medicines and other products.
Background: Scientists have for decades worked to reverse engineer cells in order to design new proteins and improve molecules found in nature, increasingly with the help of computational tools.
- Other researchers have scoured Earth for compounds made by bacteria, fungi, plants and other organisms that can be useful for particular purposes but haven't been discovered. Both approaches have yielded new cancer therapeutics and products.
- "But at some point, we run out of low-hanging fruit to pick," says Kyunghyun Cho, a professor of computer science and data science at New York University and senior director of Frontier Research at Prescient Design, which is part of Genentech.
Now, generative AI models — similar to the large language model (LLM) that powers ChatGPT — are being developed to understand the rules and relationships of DNA, RNA and proteins, and the many functions and properties they produce.
How it works: Humans arrange the 26 letters in the modern English alphabet into roughly — and arguably — about 500,000 words.
- LLMs are given text that they then split into characters, words or subwords, known as tokens.
- The AI model then determines the relationships among these tokens and uses that information to generate original text.
The language of biology contains far fewer letters but produces many more "words" in the form of proteins.
- The genetic information carried in DNA is encoded in four molecules: A (adenine), C (cytosine), T (thymine) and G (guanine).
- Three-letter combinations of these four basepairs, called codons, give rise to 20 different amino acids, some or all of which are strung together in different orders to make up proteins.
- There are more than 200 million known proteins — and many orders of magnitude more that are theoretically possible.
- That leaves a vast space to explore for scientists who want to develop new therapeutics or engineer cells to perform different tasks.
Part II: Biology's hurdles for AI
AI models are being used to map that space to identify changes in DNA, RNA or proteins that underpin disease and other processes. But scientists doing that face several hurdles.
- They must figure out the best way to break biology's language down into tokens that the LLM can work with.
- They must ensure the AI is able to see the relationships between genes and elements of genes that affect one another from different places in a long stretch of DNA, says Joshua Dunn, a molecular and computational biologist at Ginkgo Bioworks, which uses AI to drive some of its gene designs. It's like having to pull sentences from different parts of a book to understand its meaning.
- Another consideration is that if you read DNA from different starting points, you can wind up with different proteins — if you start mid-sentence, you get a different story than if you start at the sentence's beginning.
- And while most proteins are encoded in standard genetic code, others are transcribed by different "readers" in cells. "That means there are a whole lot of different languages being spoken at the same time," Dunn says.
Dunn says he is "extremely optimistic that large language models are going to figure out some of this because they're actually very good at understanding different scales of meanings spoken in different languages."
Where it stands: It's early days for AI foundation models in biology but companies, including Profluent Bio, Inceptive and others, and academic groups are developing models for deciphering the language of DNA and designing new proteins.
- HyenaDNA, a "genomic foundation model" developed by researchers at Stanford University, learns how DNA sequences are distributed, genes are encoded and how regions in between those that code for amino acids regulate a gene's expression.
Yes, but: Like with LLMs, there is concern about biased training data based on where samples are taken from, says Vaneet Aggarwal, a computer scientist and professor at Purdue University who has worked on AI models to understand the language of DNA.
What's next: Spewing out novel molecules from generative models is only a first step — and not necessarily the biggest hurdle, Cho says.
- Candidate molecules have to go through several more phases of development to filter out the most promising ones for experimental testing in the lab, he says.
The bottom line: LLMs that handle human language are "speeding up what we already know how to do," Cho says — but with biology, "we're trying to figure out something we've never figured out ourselves." That means "the burden of validation is ... enormous."
3. Trust in scientists drops among both Republicans and Democrats: poll
Illustration: Aïda Amer/Axios
The share of Americans who trust scientists and believe science has had a mostly positive effect on society has fallen significantly over the past four years, Axios' Sareen Habeshian writes from a new Pew Research Center survey.
The big picture: Trust in scientists declined during the pandemic, at a time when public health officials came under fire for business closures and vaccine and mask mandates.
- There's evidence that declining trust is a "function of institutional distrust in general," M. Anthony Mills writes in the New York Times.
By the numbers: Pew found 73% of Americans have at least "a fair amount" of trust in scientists to "act in the best interests of the public," down from 86% in 2019.
- The share expressing a "great deal" of trust fell from 35% to 23% in that time, while the percentage with "not too much" trust in scientists or "none at all" climbed from 13% to 27%.
- At the same time, the share of Americans who believe science has had a mostly positive effect on society fell from 73% in 2019 to 57% in 2023.
Details: The distrust rose among Democrats and Republicans, per the survey of 8,842 U.S. adults conducted from Sept. 25 to Oct. 1.
- It's particularly pronounced among Republicans, with 38% saying they have "not too much" or "no confidence at all" in scientists.
- That's up dramatically from the 14% of Republicans who held this view in April 2020, the early days of the pandemic.
Meanwhile, a large majority of Democrats (86%) continue to express at least a fair amount of confidence in scientists.
- Still, the share of Democrats and Democratic-leaning independents with a great deal of confidence in scientists — which initially rose in the pandemic's first year — now stands at 37%.
- That's down from a high of 55% in November 2020.
The bottom line: "The overall differences in partisan views remain much more pronounced today than they were prior to the coronavirus outbreak," the Pew researchers concluded.
4. Worthy of your time
U.K. approves world's first CRISPR-based medicine — and what to know (STAT News team)
In the Great Pacific Garbage Patch, new marine ecosystems are flourishing (Tim Brinkhof — Knowable)
DeepMind wants to define what counts as artificial general intelligence (Will Douglas Heaven — MIT Tech Review)
Fossils are shaped by people. Does that matter? (Asher Elbein — Undark)
5. Something wondrous
Photo: David Silverman/Getty Images
Sunflowers rely on the expression of different genes to track the sun across the sky each day — and to turn back to the east at night in preparation for the next sunrise, according to a recent study.
The big picture: This light-tracking behavior, called heliotropism, is well-known but not completely understood.
How it works: When sunflowers face the morning sun in the east, cells on the west, shady side of the plant elongate, triggered by a hormone called auxin, causing the stem to grow on that side and bend it toward the light.
- In the afternoon, cells on the east side of the stem do the same, bending the flower in the opposite direction. At night, the west side grows again, readying the plant for the morning sun.
- The molecular details of how light causes cells to behave in this way haven't been described in detail.
What they found: Stacey Harmer, a plant biologist at the University of California Davis and her colleagues, grew sunflowers in the lab and found different genes were expressed when the stem bent toward the light and away from it. They also weren't in the pathway that involves that auxin.
- When they moved the plants outside, the sunflowers expressed different genes as they tracked the sun, compared to when they grew toward light indoors, the researchers reported in PLOS Biology.
- They also found it took the plants just one day to learn to track the sun. "So somehow they pick up this behavior super quickly, and that really surprised me," Harmer said.
- The plant's behavior also didn't depend on whether it saw blue or red light, hinting there may be multiple pathways to control the movement. Another "total surprise," she said, adding she now wants to look at the proteins produced during the light-tracking behavior.
The impact: Previous work showed that if sunflower plants are allowed to track the sun, they grow better, Harmer says.
- "This is something breeders and farmers should be careful not to breed out of plants. Understanding the pathway could help to ensure that doesn't happen."
Big thanks to managing editor Scott Rosenberg, to Natalie Peeples on the Axios Visuals team, and to copy editor Carolyn DiPaolo.
Sign up for Axios Science

Gather the facts on the latest scientific advances


