Axios Login

A smartphone with different colored buttons floating above its surface.

April 24, 2023

Ina Fried

I am too mature to use this intro to gloat to my dad that the Edmonton Oilers beat the LA Kings in overtime last night. Far too mature. Today's Login is 1,292 words, a 5-minute read.

1 big thing: How we all became AI's brain donors

Animated illustration of a block of text being highlighted to reveal a highlighted portion resembling a pink brain, followed by an arrow clicking and selecting "copy" from a drop-down menu — Illustration: Annelise Capossela/Axios

The AI boom is built on data, the data comes from the internet, and the internet came from us, Axios' Scott Rosenberg writes.

Driving the news: A Washington Post analysis of one public data set widely used for training AIs shows how broadly today's AI industry has sampled the 30-year treasury of web publishing to tutor their neural networks.

Why it matters: Ever written a blog? Built a web page? Participated in a Reddit thread? Chances are your words have contributed to the education of AI chatbots everywhere.

The big picture: While this massive verbal repurposing is triggering an important legal brawl over whether it should be treated as fair use or theft, it's also inspiring a personal reckoning for many of the millions whose postings built today's online world.

We thought we were sharing our hearts and minds, and of course we were.

But without realizing it we were also creating a database, incomplete but rich, of human expression.
That database makes the uncannily adept sentence-completion gymnastics of ChatGPT and its competitors possible.

Because visual AI tools like Dall-E, Midjourney and Stable Diffusion got popular before verbal chatbots like ChatGPT took off, visual creators —photographers, illustrators and fine artists — were the first to grapple with this realization.

Musicians face the same kind of epiphany, as they encounter multiplying AI-conjured facsimiles of their works — like last week's (never-happened) collaboration between Drake and the Weeknd, "Heart on My Sleeve."

But far more of us have typed a few words on the internet than have ever recorded songs or drawn pictures.

The Washington Post project lets you enter any internet domain name to see whether and how much it contributed to one AI training database. (This isn't the same one OpenAI used for ChatGPT or its other projects; OpenAI has not disclosed its training-data sources.)
"The data set contained more than half a million personal blogs, representing 3.8 percent" of the total "tokens," or discrete language chunks, in the data, the Post team found. (Postings on proprietary social media platforms like Facebook, Instagram and Twitter don't show up — those companies have kept access to their data to themselves.)

Of note: These training databases are enormous but hardly representative. Some cultures, groups and subjects are oversampled; many others are unfairly neglected. And all the biases, limitations and toxic aspects of internet culture show up in the AI training data.

My thought bubble: The personal blog I wrote fairly consistently for 15 years is well represented in the Post data set — along, it seems, with most of the other writing I contributed for ten years to the web magazine I helped create.

If you have any kind of online history, the self-lookup opportunity the Post's research provides is irresistible, like Googling your own name. (There's a similar lookup tool called "Have I Been Trained?" for visuals.)
When you do find your work listed, you're probably going to ask yourself, as I did, "Is this what I wanted?" and "Why wasn't I consulted?" and "What if I'd known this was coming?"

Be smart: AI's hunger for training data casts the entire 30-year history of the popular internet in a new light.

Today's AI breakthroughs couldn't happen without the availability of the digital stockpiles and landfills of info, ideas and feelings that the internet prompted people to produce.
But we produced all that stuff for one another, not for AI.

From this vantage, the existence of these vast "corpuses" of data was a profoundly important unintended consequence of the rise of the web itself.

Today, this unintended consequence is front and center in our online experience — reminding us that everything we're doing right now with, and to, AI will in turn shape the future in ways we can't foresee.

For instance: If we unleash a flood of simulacra on our public networks, we risk discouraging people from continuing to share, or even make, their own original work.
That might leave future AI models stuck forever with the frozen output of humanity circa 2000-2020, with nothing newer to learn from.

2. "Verified" becomes a badge of dishonor

Tweets by Stephen King and Elon Musk seen on a cellphone on Friday. Photo illustration: Christopher Furlong/Getty Images

Twitter users are pushing back against Elon Musk's new pay-for-verification policy, with many journalists and celebrities opting to cancel their subscriptions instead of keeping their blue checks, Axios' Sara Fischer, Rebecca Falconer and I report.

Why it matters: Verification used to be a badge of honor and, more importantly, a way to determine that high-profile accounts were those of who they purported to be. Now that it's achievable to anyone who is willing to buy it, it's become a signal of desperation.

Driving the news: Twitter last Thursday began removing blue check marks from hundreds of thousands of accounts belonging to celebrities, journalists and other public figures who were verified by the platform before Musk changed the rules.

The Twitter CEO later announced he's personally paying for some high-profile users to remain verified on Twitter, even when they'd indicated they didn't want this status under his new subscription system.
Then, it emerged over the weekend that blue check marks had returned to the Twitter profiles of many accounts with more than 1 million followers.

Be smart: The new system could create more chaos and confusion for everyday users who relied on verification to know whether to trust accounts claiming to belong to notable figures.

"This account is verified because they are subscribed to Twitter Blue and verified their phone number," Twitter said, even on accounts where the company had added back the verified status.
That could be a legal issue for Twitter, as the company has implied that celebrities are paying for an account they are not, which could be construed as a falsely claimed endorsement.
The company included the same message with the accounts it re-verified of people no longer alive, including Jamal Khashoggi, who was killed in 2018; chef Anthony Bourdain, who died the same year as the Washington Post journalist; and NBA star Kobe Bryant and actor Chadwick Boseman, who both died in 2020.

The big picture: More social media networks, including Meta, are forcing users to pay for features that were once free, including verification.

What they're saying: "The only thing that is really 'verified' today is Elon Musk has no clue what he's doing with Twitter," former CBS News anchor Dan Rather tweeted.

3. Apple throws VR spaghetti against the wall

Mark Gurman, Bloomberg's well-sourced Apple reporter, says that the company will pitch a wide range of capabilities for its upcoming mixed reality headset, in part to make up for the lack of a killer feature.

Why it matters: Apple's forthcoming device is expected to be quite expensive, likely several thousand dollars, making it far from an impulse buy.

Driving the news: The Apple headset is expected to support a whole host of features from gaming to fitness to reading and productivity apps when it launches later this year, Gurman reports.

The big picture: Even with a solid lineup of games, Meta has struggled to make a compelling case to would-be buyers for its high-end Meta Quest Pro device.

The company initially downplayed gaming, touting the device as a new way to tackle work tasks.

4. Take note

On Tap

The RSA cybersecurity conference takes place this week in San Francisco.

Trading Places

Comcast parted ways with NBCUniversal CEO Jeff Shell over "inappropriate conduct," the company said Sunday. In a joint statement, Shell admitted to having an "inappropriate relationship with a woman in the company."

ICYMI

Arm, which typically designs chip cores and leaves the layout and manufacturing to its partners, is said to be creating prototype semiconductors to highlight the company's latest capabilities. (Financial Times)

5. After you Login

Check out these amazing photos that Prathyusha Kamasani took on her plane ride home from Microsoft's MVP Summit last week.

Thanks to Scott Rosenberg for editing and Bryan McBournie for copy editing this newsletter.