
Illustration: Rebecca Zisser / Axios
Since Darwin's day, the principles of evolution have been used to try to explain how and why language changes. In a new study, researchers look at changes within the English language over short periods of time and find random chance plays a larger role than previously thought in quickly altering aspects of language.
What's new: Techniques used by biologists to study genetic changes have been honed, and linguists can now use them to analyze the large amounts of digitized texts. Understanding what causes individual words, sounds and syntax to change within a single language could provide clues about how new languages arise. And, advances in linguistic understanding have led to better speech recognition, predictive text, artificial intelligence and related algorithms.
Bottom line: We are beginning to get a detailed picture of the history of language, much like we have for the history of species.
"This is a really interesting piece of work that nicely combines large-scale databases with quantitative modeling. These approaches are becoming more common in linguistics, helping to revitalize the field," says Simon Greenhill from the Max Planck Institute for the Science of Human History, who was not involved in the research but recently analyzed 81 Austronesian languages (from islands in Southeast Asia and the Pacific) and found their grammar changed faster than their lexicon.
How evolution works: Selection is when one form of a word is preferred over another — because it sounds better, is popular or more effective, or is easier to say or remember — and therefore perpetuated, whereas drift or neutral evolution is when a version of a word is randomly copied. It just sort of happens.
What they did: Physicist-turned-biologist Mitchell Newberry and linguist Christopher Ahern along with their colleagues at the University of Pennsylvania ran their statistical models, which borrow from techniques in population genetics, on databases of digitized texts containing more than 400 million words and spanning the 12th to 21st centuries. They tested the models on three known grammatical changes in the English language:
- The use of -ed to create past-tense verbs. They found drift accounted for changes in rare verbs that make them more prone to being replaced. And, the preferred form of just six of the 36 verbs they studied arose via selection. Some were irregular, which linguists have hypothesized may be because those forms pleasingly rhyme with other words being used frequently at the time. In the study, the rise of the irregular verb quit was found to coincide with more use of the words hit, split and slit, supporting the hypothesis.
- The rise of do as a verb. Random drift seems to have initially brought do into questions beginning in the 1500s and then natural selection took over as do made its way into other contexts like when Say not that! became Don't say that! for "reasons of grammatical consistency or cognitive ease," they wrote.
- How negative sentences were formed. In Middle English, the phrase was I not say, then it changed to I not say not, then by Shakespeare's time it was, I say not and, finally, I don't say. This cycle of moving no around to wherever it gives us the most emphasis was known to happen across many languages due to natural selection and the test basically acted as the control for their model.
"One of the crucial things we've added is that at certain time scales — year by year counts as opposed to century by century — the picture is really different. If you look at the data in a fine grain way, you will discover new things," says Ahern.
Yes, but: Some linguists contacted by Axios were skeptical the study's findings were novel and said historical data is incomplete. "The amount of material we have varies a lot from century to century, and that affects the conclusions that can be drawn from the dataset," says Claire Bowern, a linguist at Yale University who was not involved in the study. She also points out that linguists have looked at the role of random drift in language change before.
The response: Ahern says that among linguists, "the level of detail at which these changes are 'well known' is a little bit overstated." And, he says the point is he and his colleagues came up with a way to test long-standing hypotheses for the first time. "We're building in language the intellectual infrastructure that we built in genetics in the 60s," Newberry says.
What's next: Determining whether the patterns occur in other languages and extending and nuancing the models by incorporating other work in linguistics, Greenhill says.