General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsTechdirt -- Blocking The Internet Archive Won't Stop AI, But It Will Erase The Web's Historical Record
Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper.
Thats effectively whats begun happening online in the last few months. The Internet Archivethe worlds largest digital libraryhas preserved newspapers since it went online in the mid-1990s. The Archives mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains more than one trillion archived web pages and is used daily by journalists, researchers, and courts...
Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response. Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didnt start, and didnt ask for.
If publishers shut the Archive out, they arent just limiting bots. Theyre erasing the historical record.
Archiving and Search Are Legal
Making material searchable is a well-established fair use. Courts have long recognized its often impossible to build a searchable index without making copies of the underlying material. Thats why when Google copied entire books in order to make a searchable database, courts rightly recognized it as a clear fair use. The copying served a transformative purpose: enabling discovery, research, and new insights about creative works.
The Internet Archive operates on the same principle. Just as physical libraries preserve newspapers for future readers, the Archive preserves the webs historical record. Researchers and journalists rely on it every day. According to Archive staff, Wikipedia alone links to more than 2.6 million news articles preserved at the Archive, spanning 249 languages. And thats only one example. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online.
The same legal principles that protect search engines must also protect archives and libraries. Even if courts place limits on AI training, the law protecting search and web archiving is already well established.
The Internet Archive has preserved the webs historical record for nearly thirty years. If major publishers begin blocking that mission, future researchers may find that huge portions of that historical record have simply vanished. There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.
https://www.techdirt.com/2026/03/26/blocking-the-internet-archive-wont-stop-ai-but-it-will-erase-the-webs-historical-record/
Note: Techdirt always provides most of its content, including articles and podcasts, to anyone for free.
Sorry not to be able to save this link to either the Wayback Machine (without a donation /signup) or archive.ph
Norrrm
(4,978 posts)The internet itself, (not archiving), is a poor historical record. I have seen internet articles from the newspaper show exactly what was in print. Then a few days later, it was slightly different. Certain facts may have been taken out/changed without acknowledgement.
ancianita
(43,306 posts)preserving facts and historical context.
progree
(12,953 posts)and have posted many such links here.
I also occasionally archive a page there.
I'm not aware of any requirement to sign up or have an account or donate or anything like that to do either search or archive a web page. (I have donated in the past, but I don't think it "knows" that - the page has a "Sign up | Log in" at the top right, yet I can search and archive a web page). To archive a web page, click on "Web" at the top near the left end.
ancianita
(43,306 posts)When I tried to copy/paste the URL I needed archived (or access to it, if it was already archived), it sent me to a sign up page. I decided not to do that for now. But I'm increasingly aware that if important local journalism could disappear, I'll do it soon, even though mainstream publishers archive their own reportage and investigations.
progree
(12,953 posts)and went to archive.org and accessed an article that had been archived.
As for saving a web page, after clicking on "Web" on the top, near left end
I got this:
* Save screenshot
* Save also in my web archive
* Email me the results
* Email me a WACZ file with the results
I just ignored that (since it's not a screenshot and I don't need the other features) and entered my URL and saved it.