What OpenAI knows about you

Ina Fried

Illustration: Maura Losch/Axios

OpenAI uses some customer information to power ChatGPT and other services, but like other AI providers it relies heavily on "publicly available" information scraped from the internet to train its generative models.

The big picture: The company behind ChatGPT — originally a nonprofit, now gradually transforming itself into a more traditional startup — has been relatively clear about how it uses customer data. Like most of its competitors, however, it doesn't tell the world exactly what data its models have been trained on.

In our ongoing series on What AI Knows About You, Axios is looking company by company at the ways tech giants are and aren't using their customers' information to develop and improve their products, and how users can opt out.

Today's AI developers don't face any requirement to divulge the exact sources for their training data — but under various privacy laws, they do have to reveal what customer data they collect and how they use it.

Zoom in: OpenAI, like most AI providers, makes a strong distinction between business customers and general consumers.

By default, ChatGPT Enterprise, ChatGPT Team and ChatGPT Edu customer data is not used to train models.
The same goes for those using OpenAI's services via an application programming interface (API). API customers can choose to share data with OpenAI to improve and train future models.
Consumers — both free and paid — can easily control whether they contribute to improve and train future models in their settings. (OpenAI has more details here.)
"Temporary chats" in ChatGPT are not used to train OpenAI models and are automatically deleted after 30 days.
For GPTs (custom versions of ChatGPT that developers can build for others to use), there is an opt-out option for the builder of the custom GPT, allowing them to decide whether their proprietary data can be used by OpenAI for model training.

Between the lines: Apple also has an arrangement with OpenAI to access ChatGPT through Apple Intelligence, coming with iOS 18.2 and updated versions of the Mac and iPad OS.

For those who don't log in to their ChatGPT Plus account, these Apple Intelligence requests are not stored by OpenAI, and users' IP addresses are obscured.
For those who do link a paid ChatGPT Plus account with Apple Intelligence, OpenAI's standard privacy policies apply.

Microsoft also makes heavy use of OpenAI services to power its many Copilots, with its privacy policies applying, as we laid out previously in this series.

The big picture: OpenAI — which faces an array of legal action from authors, newspapers and other publishers who say the company made illegal use of their content — says it offers a number of options for creators to control how their data is used.

First, OpenAI says it has an opt-out process for web publishers to prevent its GPTbot from accessing their sites for future training of its generative AI foundation models.
It has a separate bot (OAI-SearchBot) that is used for ChatGPT's search function. That crawler is used to help ChatGPT link to and surface websites in search results, but it is not used to train OpenAI's foundation models.
OpenAI says it tries to remove certain information that it doesn't want its model to learn from or output, such as sites that primarily aggregate personal information.
It also takes "a number of privacy-protective steps to reduce the processing of any incidental personal information."

Go deeper:

Add Axios on Google

What OpenAI knows about you

What to read next