Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

ancianita

(43,306 posts)
Fri Mar 27, 2026, 06:41 AM 8 hrs ago

Techdirt -- Blocking The Internet Archive Won't Stop AI, But It Will Erase The Web's Historical Record

Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper.

That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online in the mid-1990s. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains more than one trillion archived web pages and is used daily by journalists, researchers, and courts...

Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response. Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn’t start, and didn’t ask for.
If publishers shut the Archive out, they aren’t just limiting bots. They’re erasing the historical record.

Archiving and Search Are Legal
Making material searchable is a well-established fair use. Courts have long recognized it’s often impossible to build a searchable index without making copies of the underlying material. That’s why when Google copied entire books in order to make a searchable database, courts rightly recognized it as a clear fair use. The copying served a transformative purpose: enabling discovery, research, and new insights about creative works.

The Internet Archive operates on the same principle. Just as physical libraries preserve newspapers for future readers, the Archive preserves the web’s historical record. Researchers and journalists rely on it every day. According to Archive staff, Wikipedia alone links to more than 2.6 million news articles preserved at the Archive, spanning 249 languages. And that’s only one example. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online.

The same legal principles that protect search engines must also protect archives and libraries. Even if courts place limits on AI training, the law protecting search and web archiving is already well established.

The Internet Archive has preserved the web’s historical record for nearly thirty years. If major publishers begin blocking that mission, future researchers may find that huge portions of that historical record have simply vanished. There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.

https://www.techdirt.com/2026/03/26/blocking-the-internet-archive-wont-stop-ai-but-it-will-erase-the-webs-historical-record/

Note: Techdirt always provides most of its content, including articles and podcasts, to anyone for free.

Sorry not to be able to save this link to either the Wayback Machine (without a donation /signup) or archive.ph



5 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies

Norrrm

(4,978 posts)
1. The internet itself, (not archiving), is a poor historical record.
Fri Mar 27, 2026, 06:54 AM
8 hrs ago

The internet itself, (not archiving), is a poor historical record. I have seen internet articles from the newspaper show exactly what was in print. Then a few days later, it was slightly different. Certain facts may have been taken out/changed without acknowledgement.

ancianita

(43,306 posts)
2. Agree! It's subject to all kinds of revisionist shenanigans. Which is exactly why archiving is crucial to
Fri Mar 27, 2026, 07:17 AM
8 hrs ago

preserving facts and historical context.

progree

(12,953 posts)
3. I often use archive.org to find federal government web pages as they were before Trump II
Fri Mar 27, 2026, 08:39 AM
6 hrs ago

and have posted many such links here.

I also occasionally archive a page there.

I'm not aware of any requirement to sign up or have an account or donate or anything like that to do either search or archive a web page. (I have donated in the past, but I don't think it "knows" that - the page has a "Sign up | Log in" at the top right, yet I can search and archive a web page). To archive a web page, click on "Web" at the top near the left end.

ancianita

(43,306 posts)
4. Me, too, just not with the Wayback Machine, which did have "Donate," "Sign up Log in" on the Home page
Fri Mar 27, 2026, 12:45 PM
2 hrs ago

When I tried to copy/paste the URL I needed archived (or access to it, if it was already archived), it sent me to a sign up page. I decided not to do that for now. But I'm increasingly aware that if important local journalism could disappear, I'll do it soon, even though mainstream publishers archive their own reportage and investigations.

progree

(12,953 posts)
5. That's odd, I have no problem. I used InPrivate mode (Edge browser) to presumably make sure it doesn't "know" me
Fri Mar 27, 2026, 02:33 PM
1 hr ago

and went to archive.org and accessed an article that had been archived.

As for saving a web page, after clicking on "Web" on the top, near left end

I got this:

Sign in to use extra features:
* Save screenshot
* Save also in my web archive
* Email me the results
* Email me a WACZ file with the results


I just ignored that (since it's not a screenshot and I don't need the other features) and entered my URL and saved it.
Latest Discussions»General Discussion»Techdirt -- Blocking The ...