Axios AI+

November 04, 2024

Ina Fried

My Election Week Lego-building fest is in full effect, with the current build — a Polaroid camera — about three-quarters done.

Today's newsletter is 1,316 words, a 5-minute read.

1 big thing: What AI knows about you

Illustration of a brain surrounded by circuits and glitchy shapes. — Illustration: Maura Losch/Axios

Most AI builders don't say where they get the data they use to train their bots and models — but legally they're required to say what they're doing with their customers' data.

The big picture: These data-use disclosures open a window onto the otherwise opaque world of Big Tech's AI brain-food fight.

In this new Axios series, What AI knows about you, we'll tell you, company by company, what all the key players are saying and doing with your personal information and content.

Why it matters: You might be just fine knowing that picture you posted on Instagram is helping train the next generative AI art engine. But you might not — or you might just want to be choosier about what you share.

Zoom out: AI makers need an incomprehensibly gigantic amount of raw data to train their large language and image models.

The industry's hunger has led to a data land grab: Companies are vying to teach their baby AIs using information sucked in from many different sources — sometimes with the owner's permission, often without it — before new laws and court rulings make that harder.

Zoom in: Each tech giant is building generative AI models, and many of them are using their customer data, in part, to train them.

In some cases it's opt-in, meaning your data won't be used unless you agree to it. In other cases it is opt-out, meaning your information will get used unless you explicitly say no.
These rules can vary by region, thanks to legal differences. For instance, Meta's Facebook and Instagram are "opt-out" — but you can only opt out if you live in Europe or Brazil.
In the U.S., California's data privacy law is among those responsible for requiring firms to say what they do with user data. In the EU, it's the GDPR.

Between the lines: AI makers' data-use practices vary based on whether a firm operates in the consumer realm or the enterprise business.

On the consumer side, especially with free services, opt-out options are often more limited, while businesses and organizations generally expect their data won't be used.
Adobe ignited a firestorm with changes to its terms of service that left the impression it was using business customers' data to train its generative AI systems. In response, the company put its pledge not to do so in writing.

Where companies get the data they use to train their models — essentially, the "teaching" phase — is separate but related to what they do with customer data that's shared with AI once the training is done and customers are using a service.

Apple, for example, is making extensive use of personal data for Apple Intelligence.

But the company has committed to a new architecture that it says will ensure the data remains private.
Personal information will be processed on-device (like your own phone) — or, if it needs to be sent to a cloud data center, Apple says it will ensure that no one other than the user (even Apple) will have access.

Microsoft, meanwhile, has several times delayed the Recall feature of its Copilot+ PCs because of data-privacy questions.

Although the work is being done on-device, it initially was stored in a way that other software could easily access.
Microsoft's approach also preserves tons of screenshots, which can include an array of sensitive information — although the company has settings to turn off the feature for specific apps and websites.

OpenAI has an array of different policies and options that vary based on the type of customer and whether they are using free or paid services.

What's next: Over the coming weeks, this series will look company-by-company at what customer data is being used to train AI and what happens to your data when you use an AI service.

We will talk to tech giants and big AI companies as well as other consumer and enterprise software companies whose policies or practices have garnered attention.
We'll dig into their policies and the options available to customers who don't want their data used for AI training.

The bottom line: In tech's social media era, the industry built vast global networks that transmuted our posts and clicks into rivers of profit by monetizing users' personal information.

AI is giving that information new value and giving us new reasons to provide it — but at least this time around, we should know what we're getting into.

Next up in this series: What Meta's AI knows about you

2. Tests check whether health care AI really works

Tina Reed

Illustration of a cursor hand holding a red editing pen. — Illustration: Maura Losch/Axios

Health systems are under increasing pressure to embrace new artificial intelligence tools without a formal system for evaluating how well they work.

Why it matters: Even AI developers can struggle to explain why a model makes a particular prediction or recommendation.

That has big implications in clinical settings, where algorithm errors or bias can result in patient harm.

Driving the news: The Coalition for Health AI, made up of more than 3,000 health systems, tech companies and patient advocates, is creating a network of "assurance labs" with the talent and bandwidth to validate systems and evaluate ongoing performance.

The idea is to create an ecosystem of labs that are trustworthy and "don't have commercial entanglements with the vendor that they're validating the model for," CEO and co-founder Brian Anderson told Axios.
The goal is to use "ingredient and nutrition labels" to standardize how AI models are evaluated and ensure they're tested on data representative of a range of patients in a particular region.

Between the lines: While health systems generally rely on the FDA to vet the tools they use, AI algorithms present different challenges because they change over time, and because the data they ingest can be highly variable.

That presents a unique challenge for regulators, hospitals and clinics.
An algorithm trained with data from patients in Boston may not work when applied to patients at a hospital in Santa Fe, New Mexico.
"If you want to know your AI is actually doing what you thought it was doing, you actually need to validate it in the situation in which it's being used," FDA Commissioner Robert Califf said recently while speaking at the HLTH conference in Las Vegas.
In an article last month in JAMA Network, Califf and co-authors wrote about the need for ongoing post-market monitoring of AI to help prevent the risk of algorithm failure and model bias.
"I don't know of a single health system in the U.S. which is capable of doing the validation that's needed," Califf said.

Zoom in: Health systems are inundated with pitches for AI technology, David Newman, chief medical officer of virtual care for Sanford Health System, which operates 48 medical centers and more than 200 clinics in the Midwest, told Axios.

"I looked at my inbox yesterday and I had 22 emails from AI companies," Newman said. "I don't know if they've been validated or not. I don't know if they're solving a problem at all. But it's really hard to wade through that to see what actually is useful for patients and our providers."
At Sanford, any new AI products are vetted by a governance committee and then internally validated by a data analytics team before they can be deployed.

What to watch: CHAI is soliciting feedback from different users and AI developers, and aims to release a final version of its plan early next year.

3. Training data

Nvidia has replaced Intel as one of 30 blue-chip stocks in the Dow Jones Industrial Average, a sign of how much generative AI now dominates the tech sector. (Axios)
A struggling state-funded Polish radio station used AI to interview dead celebrities. Mass outrage forced them to stop. (New York Times)
Musk's robotaxis depend on AI advances that experts say are a long way off. (Wall Street Journal)

4. + This

What would it mean if an LLM could meditate? And what would it mean for Anthropic's server bills if it became a habit?

Thanks to Megan Morrone and Scott Rosenberg for editing this newsletter and to Caitlin Wolper for copy editing it.