# Internet Archive Reference Inventory

> tool capable of fetching, extracting, transforming and storing reference information from Wikipedia articles as structured data and making them accessible via an API

**Wikidata**: [Q117023013](https://www.wikidata.org/wiki/Q117023013)  
**Source**: https://4ort.xyz/entity/internet-archive-reference-inventory

## Summary
The Internet Archive Reference Inventory (WARI or IARI) is an open-source software tool that fetches, extracts, transforms, and stores reference information from Wikipedia articles as structured data, making it accessible via an API. Developed primarily by Swedish software developer Nizo Priskorn, it operates as a server-side web API and application programming interface written in Python. It builds on the wcdimportbot ETL-framework and supports goals like structured data in the context of WikiCite, digitization, data quality, and combating fake news.

## Key Facts
- Instance of: open-source software, application programming interface, server-side web API
- Wikidata description: tool capable of fetching, extracting, transforming and storing reference information from Wikipedia articles as structured data and making them accessible via an API
- Aliases: WARI, IARI
- Programming language: Python (applies to back end and server-side web API)
- Website: https://github.com/internetarchive/wari
- Based on: wcdimportbot (ETL-framework for creating and updating reference items in a Wikibase with all citations/references in Wikipedia in near real-time; inception 2021-11-22 per commit criterion)
- Part of: WikiCite
- Facet of: WikiCite, digitization, structured data, fake news, data quality
- Has goal: structured data
- Operator: Turn All References Blue
- Developer: Turn All References Blue; Nizo Priskorn (roles: designer, programmer, operator, software tester; start time 2022-03-09; reference commit 85bbecb9ca6ee3619045ee8333621e56da7fe80a on 2022-11-18)
- Maintained by: Turn All References Blue
- Use: statistics, supporting document
- Has parts of the class: endpoint interface (5 total)
- Copyright status: copyrighted
- Uses: marshmallow, mwparserfromhell, pydantic, Requests, PlantUML, Flask, flask-restful, Gunicorn, MediaWiki REST API, MediaWiki Javascript API, blake3 (references via pyproject.toml on 2022-11-17 and diagrams folder)
- Developer Nizo Priskorn: Swedish software developer; occupations include programmer (from 2018), bike mechanic (from 2015), sewing machine mechanic (2014-06 to 2022), IT consultant (from 2022-03), self-employed (from 2014-06); citizenship Kingdom of Denmark, Sweden
- Contributors: James Hare (adviser from 2022-03), Sawood Alam (adviser from 2022-03), Christian Clauss (code contributor)
- Related: open-source software (sitelink_count: 71), Python (inception 1991-02-20; sitelink_count: 158), wcdimportbot
- Main subjects: Wikipedia reference, MediaWiki wikitext, structured data, Wikipedia

## FAQs
**What does the Internet Archive Reference Inventory do?**  
It fetches reference information from Wikipedia articles, extracts and transforms it into structured data, stores it, and exposes it through an API for access. This enables uses like statistics and supporting documents in contexts such as WikiCite.

**Who created and runs it?**  
Nizo Priskorn, a Swedish software developer with roles as designer, programmer, operator, and software tester since March 9, 2022, leads development under Turn All References Blue, which also operates and maintains it. Contributors include advisers James Hare and Sawood Alam from March 2022, plus code contributor Christian Clauss.

**What technologies power it?**  
Built in Python for its back end and API, it relies on libraries like marshmallow, mwparserfromhell, pydantic, Requests, Flask, flask-restful, and Gunicorn, plus PlantUML for diagrams, MediaWiki REST API, MediaWiki Javascript API, and blake3.

**How is it connected to other projects?**  
It forms part of WikiCite, bases on wcdimportbot (incepted November 22, 2021), and relates to open-source software and Python; its facets touch digitization, structured data, fake news mitigation, and data quality efforts.

**What are its key components and goals?**  
It includes 5 endpoint interfaces and pursues structured data goals, with main subjects like Wikipedia references, MediaWiki wikitext, and Wikipedia overall; it's copyrighted open-source software.

## Why It Matters
The Internet Archive Reference Inventory addresses a critical gap in Wikipedia's ecosystem by converting unstructured references in articles into queryable structured data, enabling real-time analysis, verification, and reuse that combats fake news through improved data quality and traceability. As part of WikiCite and building directly on wcdimportbot's near-real-time ETL capabilities, it empowers statistics generation, document support, and broader digitization efforts, making millions of citations from Wikipedia (with its 158 sitelinks for Python-scale reach) programmatically accessible via a robust API. This shifts Wikipedia from a static knowledge repository to a dynamic structured data powerhouse, influencing Wikibase updates, reference blue-linking via Turn All References Blue, and global open-source initiatives by providing tools for advisers, testers, and contributors to enhance reliability in an era of information overload.

## Notable For
- Transforming Wikipedia's wikitext references into API-accessible structured data as a Python-based server-side web API.
- Being based on wcdimportbot, enabling near-real-time Wikibase updates for all Wikipedia citations.
- Featuring 5 dedicated endpoint interfaces for structured reference access.
- Development by Nizo Priskorn with multifaceted roles (designer, programmer, operator, tester) since March 9, 2022, under Turn All References Blue.
- Integration of specialized libraries like mwparserfromhell for wikitext parsing and blake3 for hashing in a Flask-Gunicorn stack.
- Contributor ecosystem including advisers James Hare and Sawood Alam from March 2022, plus code from Christian Clauss.
- Aliases WARI and IARI that tie it to Internet Archive's open-source legacy on GitHub.
- Faceting WikiCite goals against fake news and data quality in digitization.

## Body
### Overview and Classification
The Internet Archive Reference Inventory is an instance of open-source software, an application programming interface, and a server-side web API. Its core function, as described, involves fetching, extracting, transforming, and storing reference information from Wikipedia articles as structured data, with API accessibility. It carries aliases WARI and IARI, and exists as copyrighted software. Main subjects encompass Wikipedia reference, MediaWiki wikitext, structured data, and Wikipedia itself. It serves uses in statistics and supporting documents, with a goal of structured data production.

### Development and Team
Turn All References Blue acts as operator, developer, and maintainer. Nizo Priskorn serves as the primary developer, holding roles of designer, programmer, operator, and software tester starting March 9, 2022 (referenced by GitHub commit 85bbecb9ca6ee3619045ee8333621e56da7fe80a on November 18, 2022). Priskorn, a Swedish software developer, began programming in 2018; prior and concurrent occupations include bike mechanic from 2015, sewing machine mechanic from June 2014 to 2022, IT consultant from March 2022, and self-employed status from June 2014, with citizenship in the Kingdom of Denmark and Sweden. Contributors to the creative work include James Hare as adviser from March 2022, Sawood Alam as adviser from March 2022, and Christian Clauss as code contributor.

### Technical Stack and Components
Programming occurs in Python, applying to the back end and server-side web API (Python inception February 20, 1991). It uses marshmallow, mwparserfromhell, pydantic, Requests, PlantUML (in diagrams folder), Flask, flask-restful, and Gunicorn (all via pyproject.toml on November 17, 2022), alongside MediaWiki REST API, MediaWiki Javascript API, and blake3. The tool has parts of the class endpoint interface, totaling 5.

### Project Relationships and Ecosystem
It forms part of WikiCite and bases on wcdimportbot, an ETL-framework for near-real-time creation and updating of reference items in a Wikibase using all Wikipedia citations (wcdimportbot inception November 22, 2021, by commit a2eb82edd6f1c543d010efe94922f377876e3008 on November 18, 2022). Facets include WikiCite, digitization, structured data, fake news, and data quality. Related entities cover open-source software (71 sitelinks) and Python (158 sitelinks).

### Hosting and Access
The project resides at https://github.com/internetarchive/wari, aligning with Internet Archive's open-source efforts.