-
Notifications
You must be signed in to change notification settings - Fork 15
Wayback Machine
Internet Archive (Archive.org) founders launched the Wayback Machine in 2001. The service enables users to see archived versions of web pages across time. The Wayback Machine lives at http://web.archive.org, so another unofficial name of it is Web Archive. Their crawler is indexing all publicly available Internet resources since 1996, making copies of the WWW on every year.
All copies are static, so server-side scripts weren't saved and only their cached outputs are available. Some client-side scripts are saved and adapted to work in Web Archive environment but many of them gets broken. However, the text and most of graphics and linked files is saved and is available to see.
WebOne can open archived copies of pages on dead links. If this feature is enabled (via configuration file), on 404 File Not Found errors or when remote server's domain name is unresolvable WebOne tries to search requested URL in the Web Archive database. It's do via Wayback CDX Server API. In case of availability of an archived copy, the proxy server makes a redirection to that archived copy or returns the copy (depending on proxy settings). Otherwise client sees the error as is. When Web Archive have multiple saved versions of a page, WebOne prefers latest available without redirects (or simply latest available, if there is no found version with HTTP 200 code).
Default installation of WebOne is configured to open old archived copies of some web sites, even which are still alive. The list includes Microsoft.com, online services of Windows XP, Windows Media Player 6/7/8/9, IE4 Active Channels (*.cdf files), Netscape.com online services.
All copies have URL address in the fixed format: https://web.archive.org/web/YYYYMMDDHHMMSS/URL
. Date-time-stamp can be shortened by removing last digits and the nearest copy will be used. In such cases Web Archive gives a 302 redirection to a URL with full timestamp. Even the current year may be used to get most latest available copy (including cases where the last copy is from 2005).
Web Archive addresses can contain wildcards to get list of archived content.
To get list of all archived copies of the page, replace the timestamp with an *
. It is some more powerful than the Timeline Bar on the top of all archived pages and is working even in that old browsers which can't display Timeline Bar. Example: https://web.archive.org/web/*/http://google.com
.
Blue links indicates successfully created copies. Orange indicates that the URL was not found at crawl time. Green indicates redirects at crawl time.
To get list of all saved files from server's directory, enter wildcard in both timestamp and address part. Example: https://web.archive.org/web/*/http://web.ukonline.co.uk/cliff.lawson/*
. This might be useful for searching for binary files or for URLs with arguments (like http://example.com/index.php?page=index&captcha=12345
). Note that date of the last copy is not meaning that the file was removed shortly after creation of the copy. This may mean that the Web Archive robot haven't downloaded this file again due to tasteless file type or another excuses.
By default, Wayback Machine is returning a user-friendly version of content with Timeline Bar on top, and modified version of page content on the rest of the page. The modified version contains corrected links, so all links now going to Web Archive instead of real files. But sometimes it is need to get an original copy of files or hide the Timeline Bar.
The URL address can contain an suffix after timestamp like in example: https://web.archive.org/web/20130806040521if_/http://faq.web.archive.org/page-without-wayback-code/.
- No suffix - full Wayback Machine page with Timeline Bar, optimized for modern browsers.
-
id_
Identity - the original file it as it was archived. Most of links will be broken. -
js_
JavaScript - return document marked up as JavaScript. -
cs_
CSS - return document marked up as CSS. -
im_
Image - return document as an image. -
if_
orfw_
In-frame - modified version, which have proper links to archived images, styles, etc, and a JavaScript patch inside.
For old browsers it is better to use fw_
version of the pages, as it is containing minimum amount of modifications. But all hyperlinks on it still will go to regular version. This can be overridden by a WebOne edit set. :)
https://en.wikipedia.org/wiki/Help:Using_the_Wayback_Machine
- Release Archive
- Websites edits / Syntax of traffic edits
- Known bugs / Report a new bug
- Windows installation
- Linux installation
- macOS installation
- Android installation
- Configuration file
- Command line arguments
Usage:
- Installing the Root Certificate
- YouTube playback
- Using with ViewTube
- Using with virtual machines
- Using with FTP servers
- Using with MSN Messenger
Web standards timeline:
Troubleshooting guides:
Developer corner: