Significant characteristics of websites

Main Page > Documentation > Format policies > Significant characteristics > Significant characteristics of websites

As there is no formal default policy for websites in Archivematica, below is a summary of research done for clients that may inform future policy generation.

Overview

The goal of website archiving is to capture, preserve and render complete websites. An end user should be able to navigate the preserved website in the same way that the original website was navigated, and as much as possible should see the same content and experience the same functionality. Website preservation involves a number of steps, each of them requiring their own tools and procedures: capturing ("crawling" or "harvesting") a website, storing it in an archival format, applying preservation planning over time, rendering it, indexing it and providing keyword search capabilities for all of the archived content.

A number of institutions have undertaken website archiving on a large scale. Probably the best-known of these is the Internet Archive, founded by Brewster Kahle in 1996. The Internet Archive gathers and makes available a vast number of websites at no charge; it also offers a third party web archiving service called Archive-It, available for a fee. The Library of Congress has been preserving websites since 2000, acquiring government and private websites based on selected themes, events and subject areas. California Digital Library collects a wide variety of websites and makes them available on-line; like the Internet Archive, it offers a third party web archiving service for a fee. Numerous national archives and libraries also have web archiving projects, including the British Library (via the UK Web Archive), Library and Archives Canada, the National Library of New Zealand and the National Archives of Australia. (Lists of major website archiving initiatives are maintained at http://netpreserve.org/about/archiveList.php and http://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives.) International efforts to develop web archiving tools, standards and practices are managed by the International Internet Preservation Consortium (IIPC), established in 2003 by the Library of Congress, the Internet Archive and the national libraries of Australia, Canada, Denmark, Finland, France, Iceland, Italy, Norway, Sweden and the UK. (A complete list of current members is at http://netpreserve.org/about/memberList.php)

Significant characteristics of websites

Overview

Navigation menu

Search