AI comes for books

August 16, 2023

Happy Wednesday! Ina here. Today's AI+ is 1,163 words, a 4-minute read.

1 big thing: AI comes for books

The book market is beginning to show cracks under new pressures from generative AI as the technology cuts a swath across creative industries, Axios' Scott Rosenberg reports.

What's happening: AI-generated pseudo-books are spamming Amazon and other online bookstores, sometimes borrowing real authors' names to squat on their virtual real estate. Meanwhile, the use of books in the training of AI firms' large language models remains hotly contested, as authors seek to prevent unauthorized, uncompensated use of their work.

Driving the news: Searches on Amazon — estimated to control at least half of all U.S. book sales, and an even bigger share of the growing e-book market — are increasingly turning up mediocre AI-generated titles filled with unreliable information and soggy prose.

Travel guides have become a key niche for this flood-the-zone-with-crud tactic, the New York Times reported last week.
But AI-generated titles have also begun to infiltrate categories like "cooking, programming, gardening, business, crafts, medicine, religion and mathematics, as well as self-help books and novels," per the Times.
It's not always easy for buyers to distinguish these ersatz titles from human-written products (which of course contain their share of duds, too).
Reader reviews can help — but the review columns are also beginning to fill up with AI-generated posts intended to skew the ratings.

Publishing expert Jane Friedman made headlines last week with a blog post chronicling her discovery of fake books likely generated by AI that claimed to be written by her and used her name.

Amazon responded to her initial protests, she said, by asking whether she had trademarked her own name.
The company later took the book listings down after Friedman's posts received attention. As this fraudulent practice spreads, though, Friedman wonders, "What will authors with smaller profiles do when this happens to them?"

What they're saying: Amazon spokesperson Lindsay Hamilton in a statement to Axios said, "We have clear content guidelines governing which books can be listed for sale and promptly investigate any book when a concern is raised."

Amazon doesn't prohibit AI-generated book content but works created using AI sometimes intersect with violations, including not complying with intellectual property rights and misleading and "disappointing" customers.
"Amazon is constantly evaluating emerging technologies," the spokesperson's statement said, adding they have "zero tolerance for fake reviews."

The AI boom has given existing stockpiles of book-related data new value to AI developers — and made them a target of suspicion for creators.

Prosecraft, a project by Benji Smith, the developer of an authoring tool called Shaxpir, provided statistical analysis of published books (according to characteristics like "vividness" and uses of the passive voice) that also included snippets of books' texts.
The database has been around since 2017, but recently authors began to fear that its contents might end up as fodder for AI training.
Smith took the site down last week, apologizing to "the community of authors" and writing that he'd intended the project to put powerful "lexicographical tools" in the hands of everyday authors.
"The arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process," Smith wrote, adding that both Prosecraft and Shaxpir were "labors of love."

Between the lines: The first wave of anxiety about AI focused on ways the technology might steal jobs from humans. But the experience in publishing suggests job-market turmoil may be just one of many complex waves of change.

"We spent significant time examining the impact of AI before making this investment," KKR's Ted Oberwager told Axios' Dan Primack.
"While we believe there will be opportunities to empower editors and creators with technology over time, just like the typewriter or computer did, we don't see the author at the center of the business model changing. Authors, and the human connection and experience, will not be replaced."

The big picture: The pollution of virtual bookshelves is obviously bad news for book lovers, but it also poses a threat to the AI world.

It's going to be very hard to weed out spurious AI-generated tomes from the real stuff the AI needs to train on.
If too much of the training data is itself AI-generated, studies are already warning, the large language models at the heart of AI systems will deteriorate or "collapse."
To use a gross but apt biological metaphor: They're not going to find nutrition in their own waste.

2. OpenAI touts GPT-4 for content moderation

Photo illustration: Omar Marques/SOPA Images/LightRocket via Getty Images

OpenAI, the makers of ChatGPT, say their engine can do the work of human content moderators with much of the accuracy, more consistency and without the emotional toll that people face when forced to view violent and abusive content for hours.

Why it matters: This is the latest example of tech companies touting AI as the key to tackling problems created — or exacerbated — by AI.

Details: OpenAI says it has been using the content moderation system it developed, which is based on its latest GPT-4 model, and has found it to be better than a moderator with modest training, though not as effective as the most skilled human moderators.

This system is designed to work on a range of steps in the process of identifying and removing problematic content: from the development of moderation policy through to implementing that policy.
"You don't need to hire tens of thousands of moderators," OpenAI head of safety systems Lilian Weng told Axios. Instead, Weng said people can act as advisers who ensure the AI-based system is working properly and to adjudicate borderline cases.

The big picture: Content moderation has been a huge challenge even before the arrival of generative AI. The new technology threatens to exacerbate the challenges by making it even easier to generate misinformation and other unwanted content.

However, AI is also seen by some technologists as the only likely answer to the expected rise in misinformation because of its ability to scale.
Social media companies are already heavily reliant on earlier AI technologies, such as machine learning, to scan for rule-breaking content.

3. Training Data

Officials in Iowa are trying to use ChatGPT to determine which books to to pull off school library shelves. (Popular Science)
Meanwhile, a San Francisco venture firm is trying to use AI to predict which startups are likely to become unicorns. (Business Insider)
Google is adding more experimental generative AI features, including one for its Chrome browser that can summarize long web pages.
Intel will terminate a $5.4 billion deal to acquire Israeli chip manufacturer Tower Semiconductor after China failed to sign off on the deal. (Associated Press)
Eric Braverman is leaving as CEO of Schmidt Futures, the philanthropic initiative co-founded by Eric and Wendy Schmidt, to pursue his own charitable venture. (Axios)
Testing suggests X (formerly Twitter) was throttling links to some, but not all external website. (Washington Post)

4. + This

This is an excellent way to decide which of several co-authors names should come first. Although I may have Harvey play Ryan at Super Smash Bros. to settle any AI+ byline disputes.

Thanks to Alison Snyder for editing and Bryan McBournie for copy editing this newsletter.