Find Lost Web Pages
There’s nothing more frustrating than searching for a page, finding what looks like a promising result, and then clicking though only to discover that the page is gone. Unfortunately it happens all the time. Servers get jammed, pages are removed, some servers move and some servers are simply no longer maintained. But what happens you want to find a page that’s vanished?
The answer depends partly on why the page isn’t showing up.
Dealing With the Slashdot Effect
Some sites, particularly smaller independent publishers and bloggers, can’t handle the traffic influx from having a link show up on Slashdot or Digg. The sites simply stop responding as their servers become overwhelmed. However, you might still be able to see a cached version of the content using Coral Cache.
Coral Cache
The Coral Cache logo
Coral Cache is a free service that uses distributed computing to lessen the so-called “Slashdot effect.” Coral Cache was developed to provide a distributed mirror of the original page that can handle the high traffic volume.
You don’t need any special software — just append .nyud.net to the end of a regular URL and you’ll hit the page through Coral Cache rather than directly connecting.
It won’t be quite as fast as you may be used to (compare wired.com directly with the Coral Cache version), but it could help you get to content that’s currently being choked due to an exceedingly large number of direct connections.
Finding Content That’s Been Removed
If a web page has been deleted or removed by its publisher, you can often still find it using one of the web’s longer-term caching services.
Google Cache
As search engines crawl the web, they cache fresh versions of pages as they go. To access a page in Google’s cache, just search for the original page. If it’s still in Google’s cache, you’ll see a little link leading to the page as it looked the last time Google indexed it.
In some cases, this will lead you straight to the content you want. However, sometimes the method doesn’t work. The page owner may have replaced the original page with new content, and if Google’s indexing spiders have been back to the page since that change, you won’t see the old content.
In such cases, you may be out of luck, but there is one other method you can try.
The Wayback Machine
The Internet Archive is a nonprofit organization founded with the goal of building an Internet library that could offer permanent access to web pages for researchers, historians and scholars.
The Internet Archive’s ambitious goal of indexing every page of content that ever been on the public web is not a reality, but the system certainly tries really hard. It just might have the page you seek.
The Wayback Machine is the Internet Archive’s search engine that takes a URL and then looks for pages published at that URL over time. Using the Wayback Machine, you can often find pages that have been removed or deleted from the live web years ago.
In some cases, the pages may appear a bit mangled and won’t necessarily have all the original formatting — images, stylesheets and scripts may not be referenced properly anymore — but you can at least get at the actual text content.
As of March 2008, the Internet Archive boasts 85 billion web pages. It also recently started archiving other content like movies, audio files and live music, though its indexes for multimedia content are not as extensive as the web page offerings.
Prevent Pages From Disappearing In the First Place
Many of today’s popular web-based bookmark services offer page caching as a feature. Ma.gnolia, for instance, takes a snapshot of a page when you bookmark it and caches the contents. This is helpful for ensuring that your favorite bookmarked pages don’t disappear on you. If they do, just head to ma.gnolia and click through to the cached version.
Tags: Find, Lost Web Pages