# web archiving

> process of data preservation done by collecting and saving web content

**Wikidata**: [Q2062069](https://www.wikidata.org/wiki/Q2062069)  
**Wikipedia**: [English](https://en.wikipedia.org/wiki/Web_archiving)  
**Source**: https://4ort.xyz/entity/web-archiving

## Summary
Web archiving is the process of preserving data by collecting and saving content from the World Wide Web. As a form of digital preservation, its primary goal is to ensure that this online information remains accessible, trustworthy, and usable for the future, preventing it from being lost due to updates or deletions.

## Key Facts
- **Classification:** Web archiving is a subclass of both digital preservation and data archiving.
- **Primary Function:** The process involves collecting and saving web content, with the primary object of preservation being the web page.
- **Notable User:** The process is used by the Wayback Machine.
- **Associated Company:** Webrecorder, a U.S. company founded in 2020, develops open-source tools for digital preservation.
- **Related Tools:** Specialized tools for web archiving include Browsertrix, Fossilo, and the Library of Alexandria application suite.
- **Formal Recognition:** The topic is formally recognized by multiple authorities, including the Library of Congress (ID: sh2007000528).
- **International Scope:** The subject has a global presence, with Wikipedia articles in at least 10 languages, including Arabic, Chinese, English, French, and German.

## FAQs
### Q: What is the main purpose of web archiving?
A: The main purpose of web archiving is to preserve web content to ensure that digital information of continuing value remains accessible, trustworthy, and usable for the future. It is a formal process for preventing the loss of online data.

### Q: What kind of content is collected during web archiving?
A: Web archiving primarily focuses on collecting and saving web pages. The process is designed to capture and preserve the digital content published on the internet.

### Q: What are some tools used for web archiving?
A: Various specialized tools exist for web archiving. Examples include Browsertrix, a cloud application for automated archiving; Fossilo, a commercial preservation software service; and open-source tools developed by the company Webrecorder.

## Why It Matters
Web archiving addresses the inherent impermanence of the internet. Websites are constantly changing, disappearing, or being updated, which means that vast amounts of cultural, historical, and scientific knowledge published online are at risk of being lost forever. This process provides a crucial mechanism for capturing a snapshot of the web at a specific point in time.

By systematically collecting and saving web content, web archiving creates a historical record. This is vital for researchers, historians, journalists, and the general public, as it allows them to access information that is no longer available on the live web. Institutions like the Wayback Machine rely on this process to provide a browsable history of the internet, ensuring that digital information of value remains accessible and usable for future generations, thus preserving our collective digital heritage.

## Notable For
- **Preserving Ephemeral Media:** Its primary focus is on capturing and preserving web pages, a dynamic and constantly changing form of digital information that would otherwise be lost.
- **Enabling Historical Access:** The process is the foundation for major public archives like the Wayback Machine, which makes historical versions of websites accessible to the public.
- **Formal Discipline:** It is formally classified as a subclass of both `data archiving` and `digital preservation`, establishing its distinct role within information science.
- **Specialized Tooling:** Web archiving has driven the development of a dedicated ecosystem of software, including open-source tools from Webrecorder and commercial services like Fossilo.

## Body
### ### Definition and Classification
Web archiving is defined as the process of data preservation accomplished by collecting and saving web content. The primary object of this collection and preservation is the web page. It is formally categorized as a subclass of two broader fields:
*   **Digital Preservation:** A formal endeavor to ensure that digital information of continuing value remains accessible, trustworthy, and usable.
*   **Data Archiving:** The process of collecting and preserving data.

### ### Tools, Services, and Initiatives
A number of specialized tools and organizations are associated with web archiving:
*   **Webrecorder:** A U.S.-based company founded on February 5, 2020, that develops open-source tools for digital preservation.
*   **Browsertrix:** A cloud application developed by Webrecorder that allows users to archive websites using automated browsers.
*   **Fossilo:** A commercial software service dedicated to preservation.
*   **Library of Alexandria:** A free and open-source application suite that downloads, indexes, and allows searching of document files from the internet.
*   **Megalodon:** A citation tool for web pages that was established in Japan in 2006.

The main category for topics related to this field is "Category:Web archiving initiatives."

### ### Recognition and Identifiers
Web archiving is a recognized concept in numerous international databases and authority files.
*   **Library of Congress Authority ID:** sh2007000528
*   **FAST ID:** 1742386
*   **Freebase ID:** /m/0fkp43
*   **NDL Authority ID (Japan):** 00981807

## Schema Markup
```json
{
  "@context": "https://schema.org",
  "@type": "Thing",
  "name": "web archiving",
  "description": "A process of data preservation done by collecting and saving web content to ensure it remains accessible and usable over time.",
  "sameAs": [
    "https://en.wikipedia.org/wiki/Web_archiving"
  ],
  "additionalType": "http://purl.org/ontology/bibo/Collection"
}

## References

1. Library of Congress Authorities
2. Freebase Data Dumps. 2013
3. Quora
4. KBpedia