Inside the Internet Archive's race to save federal webpages
Add Axios as your preferred source to
see more of our stories on Google.

The Internet Archive's headquarters features an interactive sculpture that intermittently flashes pages from the 500,000 sites it stored in its first year in service. Photo: Shawna Chen/Axios
In an era when government information can disappear with a click, the Internet Archive is racing to preserve a digital paper trail.
Why it matters: The San Francisco-based Internet Archive is celebrating its 30th birthday as a digital library that gained newfound prominence last year when the Trump administration began taking down and changing federal websites en masse.

State of play: Government webpages on USAID, DEI and gender, among others, simply "got wiped out," Internet Archive founder Brewster Kahle told visitors during a recent tour of their headquarters in the Richmond District.
- Information lost ranged from HIV prevention and transgender care to climate change and civil rights pioneers (like the Navajo Code Talkers).
What they're saying: "Never before have we had large numbers of entire U.S. government websites just go offline," said Mark Graham, director of the Internet Archive's Wayback Machine, which creates digital archives of webpages.
- "This time around, the scale of material that has been removed — and/or just changed — dwarfs all of the other deletions and changes since, well, the beginning of the web," he told Axios.

How it works: Every day, hundreds of the group's automated web crawlers capture snapshots of public webpages — over a billion URLs a day — and store the timestamped versions in the Wayback Machine, allowing users to see how they're revised.
- Users can paste a public URL into the Wayback Machine and then see what was on that page at various dates in the past.
- Because there's no "version control system" or list of every published government webpage, Graham and his team have to be proactive about tracking changes or deletions.
- "We don't really have the capability to differentiate and ... check every time a webpage has been changed or added, so we work hard to archive as much of the public web as we can," Graham said.

The big picture: The Wayback Machine has enabled over 1,000 news stories on government webpage deletions in the past year, according to Graham.
- Though the rise of AI has brought new challenges, the work remains as important as ever, he said.
- "What's changed ... is this heightened sense of responsibility and new understanding that we really can't take anything for granted," he told Axios.
Case in point: The Trump administration announced earlier this month that the U.S. State Department is removing all posts published on X before President Trump's second term.
- So much communication from the government happens on social media these days, so if those posts disappear, the public is left with gaps in understanding, Graham noted.
- Though it turns out they'd done a "pretty good job" archiving posts from at least 250 accounts maintained by embassies around the world, Graham said they "hadn't done quite as good a job" archiving videos and PDF files associated with the accounts.
- "So for the last week or so, we've been kind of going back in ... and identifying some of the areas where we could go deeper and try to produce a more complete record," he said.

What's next: The Internet Archive is hosting an information stewardship forum with other repositories, journalists, researchers and philanthropists in March to take stock of what they learned over the past year and how they can more effectively preserve materials.
- You can check out their work through free drop-in tours at 300 Funston Ave., hosted Fridays at 1pm.
Editor's note: This story has been corrected to reflect that Graham referred to the lack of a "version control system," (not a "master chain control system.")
