Axios AI+

A floating, translucent blue 3D render of the human brain.

August 17, 2023

Hi, it's Ryan. Today's AI+ is 1,047 words, a 4-minute read.

1 big thing: Companies struggling to deploy AI

Illustration of an unimpressed emoji at the center of a bullseye made of binary code.

Illustration: Brendan Lynch/Axios

Behind the hype of generative AI, large companies are struggling to deploy the new technology — hitting cost and data management hurdles that are leaving many of their generative AI projects stuck in pilot phase.

Why it matters: Companies remain optimistic overall about the boost in productivity promised by generative AI — but achieving the technology's potential is taking longer and costing more than many initially expected.

Driving the news: Deloitte and NVIDIA announced Wednesday they will supplement an existing AI partnership by establishing an "Ambassador AI program" to help struggling companies move to full-scale deployment of AI.

  • More than half of AI decision-makers in top companies are facing cost barriers to deploying the latest AI tools, according to S&P Global's 2023 Global Trends in AI report, which includes a survey of 1,500 AI decision-makers in companies with more than 250 employees.

What's happening: Nearly 70% of respondents to the S&P Global survey said they have at least one AI project in production.

  • But 31% of respondents said their projects are still in pilot or proof-of-concept stage, outnumbering those who said they've reached enterprise scale with an AI project (28%).

Details: Many companies are finding their data isn't organized for the AI revolution — saved in different formats, in disparate datasets, and sometimes still on paper — "forcing a complete rethink of how data is stored, managed and processed," said Nick Patience, senior research analyst at S&P Global Market Intelligence.

  • Data management (cited by 32%), security (26%) and accessing sufficient computing resources (20%) are the top challenges for respondents to the S&P Global survey.
  • Around half of the surveyed IT leaders said their organizations aren't ready to implement AI — and suggested it may take five years or more to fully build AI into their company workflows.
  • Other knock-on effects of greater AI use include its climate footprint: 68% of respondents said their internal targets for energy use are now under strain because of how much computing power AI requires.

What they're saying: "We believe data represents the strongest long-term competitive moat in the AI arms race," Fred Havemeyer, Macquarie senior Enterprise Software analyst, wrote in a July 20 client note, citing database software that supports AI workloads as his "picks for the AI gold rush."

  • Outdated data infrastructure is having "a direct, negative impact" on the ability to "achieve enterprise-scale AI deployment and to use AI sustainably," said Liran Zvibel, cofounder and CEO at WEKA, which commissioned the S&P Global report.

The big picture: Leaders in many large companies still have reservations about AI, aside from the hurdles to implementing it.

  • Jon Stross, co-founder of HR software provider Greenhouse, told Axios that while he's working to find ways to use AI, he is "super nervous" about any situation where AI could amplify bias in the hiring process, especially when AI models cannot explain how they arrive at a decision — a basic step in any hiring process.
  • Lani Phillips, vice president of Channel Sales at Microsoft said she believed AI can be a time-saver and generate useful customer insights for even the most senior salesperson, but "there is no replacement for human connections with your customers."

2. Not sure which AI to use? Help's on the way

A screenshot of Arthur Bench evaluating various generative AI systems. Image: Arthur

OpenAI remains the leader in generative AI, but its rivals have narrowed the gap and may be a better option for certain uses, Ina reports.

  • An open-source testing platform released Thursday by startup Arthur aims to help businesses figure out which large language model is best suited to their needs.

Why it matters: Given the types of results produced by generative AI — and the fact that answers can differ over time — it is often hard to quantify or even decide which generative system is best for a particular task. Arthur's tool appears among the first to make such recommendations.

Driving the news: Arthur, a New York-based startup, is using the open source tool, Arthur Bench, to launch the Generative Assessment Project, an ongoing effort to measure the effectiveness of different generative AI tools.

  • Its first assessment found that OpenAI's GPT-4 leads among the major systems.
  • But it also observed that, in certain circumstances, Anthropic's Claude 2 proved less likely to "hallucinate," or make up false information.
  • Cohere's Command, meanwhile, was less likely to reject a query as beyond its capabilities though it rarely was able to correctly answer the complex questions Arthur put forth in its tests.

How it works: Arthur tested OpenAI's GPT-3.5 and GPT-4 along with models from Cohere, Anthropic and Meta, running each query three times to take into account the fact that a single model can provide different answers even when presented with the same question.

  • The testing spanned three categories: combinatorial mathematics, U.S. Presidents, and Moroccan political leaders. Each question required multiple steps of reasoning.
  • Each answer was categorized as having either answered the query correctly, made up an incorrect answer or having avoided answering the question.

What they're saying: "OpenAI continues to perform very well, but there were some areas where some of the competitors have closed the gap," Arthur CEO Adam Wenchel told Axios.

  • "There were definitely some surprises and nuances that were pretty interesting," Wenchel said.

Of note: Arthur says it deliberately used challenging questions that today's large language models often can't answer on their own. In practice, data scientists often fetch extra context, such as Wikipedia pages, in order to answer factual questions.

Between the lines: Given the nature of generative AI responses, deciding which system is best depends on what one is looking for. One system may be more concise, and another more comprehensive, for example.

3. Training data

Thanks to Alison Snyder for editing and Bryan McBournie for copy editing this newsletter.