Aug 16, 2023 - Technology

AI-generated books are infiltrating online bookstores

Illustration of an open laptop with pages and a bookmark like a book

Illustration: Sarah Grillo/Axios

The book market is beginning to show cracks under new pressures from generative AI as the technology cuts a swath across creative industries.

What's happening: AI-generated pseudo-books are spamming Amazon and other online bookstores, sometimes borrowing real authors' names to squat on their virtual real estate. Meanwhile, the use of books in the training of AI firms' large language models remains hotly contested, as authors seek to prevent unauthorized, uncompensated use of their work.

Driving the news: Searches on Amazon — estimated to control at least half of all U.S. book sales, and an even bigger share of the growing e-book market — are increasingly turning up mediocre AI-generated titles filled with unreliable information and soggy prose.

  • Travel guides have become a key niche for this flood-the-zone-with-crud tactic, The New York Times reported last week.
  • But AI-generated titles have also begun to infiltrate categories like "cooking, programming, gardening, business, crafts, medicine, religion and mathematics, as well as self-help books and novels," per the Times.
  • It's not always easy for buyers to distinguish these ersatz titles from human-written products (which of course contain their share of duds, too).
  • Reader reviews can help — but the review columns are also beginning to fill up with AI-generated posts intended to skew the ratings.

Publishing expert Jane Friedman made headlines last week with a blog post chronicling her discovery of fake books likely generated by AI that claimed to be written by her and used her name.

  • Amazon responded to her initial protests, she said, by asking whether she had trademarked her own name.
  • The company later took the book listings down after Friedman's posts received attention. As this fraudulent practice spreads, though, Friedman wonders, "What will authors with smaller profiles do when this happens to them?"

What they're saying: Amazon spokesperson Lindsay Hamilton in a statement to Axios said, "We have clear content guidelines governing which books can be listed for sale and promptly investigate any book when a concern is raised."

  • Amazon doesn't prohibit AI-generated book content but works created using AI sometimes intersect with violations, including not complying with intellectual property rights and misleading and "disappointing" customers.
  • "Amazon is constantly evaluating emerging technologies," the spokesperson's statement said, adding they have "zero tolerance for fake reviews."

The AI boom has given existing stockpiles of book-related data new value to AI developers — and made them a target of suspicion for creators.

  • Prosecraft, a project by Benji Smith, the developer of an authoring tool called Shaxpir, provided statistical analysis of published books (according to characteristics like "vividness" and uses of the passive voice) that also included snippets of books' texts.
  • The database has been around since 2017, but recently authors began to fear that its contents might end up as fodder for AI training.
  • Smith took the site down last week, apologizing to "the community of authors" and writing that he'd intended the project to put powerful "lexicographical tools" in the hands of everyday authors.
  • "The arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process," Smith wrote, adding that both Prosecraft and Shaxpir were "labors of love."

Between the lines: The first wave of anxiety about AI focused on ways the technology might steal jobs from humans. But the experience in publishing suggests job-market turmoil may be just one of many complex waves of change.

  • "We spent significant time examining the impact of AI before making this investment," KKR's Ted Oberwager told Axios' Dan Primack.
  • "While we believe there will be opportunities to empower editors and creators with technology over time, just like the typewriter or computer did, we don't see the author at the center of the business model changing. Authors, and the human connection and experience, will not be replaced."

The big picture: The pollution of virtual bookshelves is obviously bad news for book lovers, but it also poses a threat to the AI world.

  • It's going to be very hard to weed out spurious AI-generated tomes from the real stuff the AI needs to train on.
  • If too much of the training data is itself AI-generated, studies are already warning, the large language models at the heart of AI systems will deteriorate or "collapse."
Go deeper