# wcdimportbot

> ETL-framework for creating and updating reference items in a Wikibase with all citations/references in Wikipedia in near real-time

**Wikidata**: [Q115252313](https://www.wikidata.org/wiki/Q115252313)  
**Source**: https://4ort.xyz/entity/wcdimportbot

## Summary
**wcdimportbot** is an open-source ETL (extract, transform, load) framework designed to create and update reference items in a Wikibase using citations and references from Wikipedia in near real-time. Developed primarily in Python, it automates the process of structuring Wikipedia citations into machine-readable data for Wikibase, supporting projects like WikiCite and improving data quality in linked open data ecosystems.

## Key Facts
- **Type**: Open-source ETL framework for MediaWiki wikitext, specifically for Wikipedia citations.
- **Created**: November 22, 2021 (first commit).
- **Developers**: James Hare (American, 2000–) and Nizo Priskorn (Swedish/Danish, programmer since 2018).
- **Programming Language**: Python (back-end/server-side), with JavaScript (Q2005) for MediaWiki userscripts (front-end).
- **Operating System**: Linux.
- **License**: Permissive open-source license (Q27016754).
- **Latest Stable Version**: 4.1.2 (released July 18, 2023).
- **Key Dependencies**: Wikibase Integrator (v2.1.0), Pywikibot, Requests, Flask, and others.
- **Part of**: WikiCite initiative, aimed at structuring Wikipedia references as linked open data.
- **Maintained by**: Turn All References Blue (a project focused on improving citation quality).

## FAQs
### Q: What does wcdimportbot do?
A: It extracts citations from Wikipedia articles, transforms them into structured data, and loads them into a Wikibase instance in near real-time. This helps create machine-readable reference databases for projects like WikiCite.

### Q: Who created wcdimportbot?
A: The framework was developed by James Hare (until November 2021) and Nizo Priskorn (since 2022), with contributions from advisers like Sawood Alam and code contributor Christian Clauss.

### Q: What technologies does wcdimportbot use?
A: It is built in Python and relies on libraries like Pywikibot, Wikibase Integrator, marshmallow, and Flask. It runs on Linux and integrates with Wikimedia templates.

### Q: Is wcdimportbot still actively maintained?
A: Yes, the latest stable version (4.1.2) was released in July 2023, and the project is maintained by the Turn All References Blue initiative.

### Q: How does wcdimportbot relate to Wikipedia?
A: It processes Wikipedia’s wikitext citations, converting them into structured data for Wikibase, which helps improve data quality, combat fake news, and enable linked open data applications.

## Why It Matters
wcdimportbot addresses a critical gap in the Wikipedia ecosystem: the lack of structured, machine-readable citation data. By automating the extraction and transformation of references from Wikipedia articles into Wikibase, it enables researchers, fact-checkers, and developers to analyze citation networks, verify sources, and build tools that rely on high-quality linked data. This is particularly important in combating misinformation, as structured citations allow for better tracking of source reliability and usage patterns. The framework also supports the WikiCite initiative, which aims to create a comprehensive database of Wikipedia references, fostering transparency and reproducibility in knowledge representation. For the Wikimedia movement, tools like wcdimportbot are essential for transitioning from unstructured wikitext to a semantic web-compatible data model, unlocking new possibilities for data integration and analysis.

## Notable For
- **Real-time citation processing**: One of the few tools capable of near real-time extraction and structuring of Wikipedia references.
- **WikiCite integration**: A core component of the WikiCite project, which seeks to build a global database of Wikipedia citations.
- **Open-source and modular**: Built with widely used Python libraries (e.g., Pywikibot, Flask) and designed for extensibility.
- **Cross-disciplinary impact**: Supports efforts in digitization, structured data, and combating fake news by improving citation transparency.
- **Community-driven development**: Maintained by Turn All References Blue, a collaborative initiative focused on reference quality.

## Body
### Overview and Purpose
wcdimportbot is an **ETL (extract, transform, load) framework** specialized for processing Wikipedia citations. Its primary function is to parse wikitext from Wikipedia articles, extract citation templates (e.g., `<ref>` tags), and convert them into structured data compatible with Wikibase, a platform for linked open data. This enables the creation of reference items that can be queried, analyzed, and reused across applications.

### Technical Architecture
- **Programming Language**: Primarily Python, with JavaScript (Q2005) used for front-end MediaWiki userscripts.
- **Dependencies**:
  - **Wikibase Integrator** (v2.1.0): Facilitates interactions with Wikibase.
  - **Pywikibot**: Automates Wikipedia/Wikimedia site interactions.
  - **mwparserfromhell**: Parses MediaWiki wikitext.
  - **Flask**: Provides a web API for the framework.
  - **pika/aiohttp**: Handles asynchronous messaging and HTTP requests.
  - **PlantUML**: Used for generating system diagrams.
- **Operating System**: Linux (required for deployment).

### Development History
- **Inception**: November 22, 2021 (first GitHub commit by James Hare).
- **Key Milestones**:
  - **Version 1.0.0**: Released March 9, 2022.
  - **Version 4.1.2**: Latest stable release (July 18, 2023), marked as the preferred version.
- **Developers**:
  - **James Hare**: Initial designer and programmer (active until November 22, 2021).
  - **Nizo Priskorn**: Lead developer since March 9, 2022, handling design, programming, testing, and operations.
  - **Advisers**: Sawood Alam and James Hare (post-2022).

### Use Cases and Applications
- **WikiCite**: Powers the Internet Archive Reference Inventory, a tool that fetches and structures Wikipedia references.
- **Fake News Mitigation**: By structuring citations, it enables tools to trace source reliability and detect misinformation patterns.
- **Data Quality**: Improves the accuracy and reusability of Wikipedia references in semantic web applications.
- **Compatibility**: Supports over 30 Wikimedia citation templates (as of 2022).

### Related Projects and Ecosystem
- **Internet Archive Reference Inventory**: A tool that leverages wcdimportbot to provide an API for structured Wikipedia references.
- **Turn All References Blue**: The maintaining organization, focused on improving Wikipedia’s citation ecosystem.
- **Wikibase**: The linked data platform (e.g., Wikidata) where structured references are stored.

### Licensing and Accessibility
- **License**: Open-source with a permissive license (Q27016754), allowing free use and redistribution.
- **Source Code**: Hosted on GitHub ([internetarchive/wcdimportbot](https://github.com/internetarchive/wcdimportbot)).
- **Issue Tracker**: [GitHub Issues](https://github.com/internetarchive/wcdimportbot/issues).

## Schema Markup
```json
{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "wcdimportbot",
  "description": "ETL-framework for creating and updating reference items in a Wikibase with all citations/references in Wikipedia in near real-time.",
  "url": "https://github.com/internetarchive/wcdimportbot",
  "sameAs": [
    "https://www.wikidata.org/wiki/Q115799500",
    "https://github.com/internetarchive/wcdimportbot"
  ],
  "operatingSystem": "Linux",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "Python"
  },
  "license": "https://spdx.org/licenses/Q27016754",
  "version": "4.1.2",
  "datePublished": "2021-11-22",
  "developer": [
    {
      "@type": "Person",
      "name": "James Hare"
    },
    {
      "@type": "Person",
      "name": "Nizo Priskorn"
    }
  ],
  "applicationCategory": "ETL Framework",
  "softwareRequirements": [
    "Wikibase Integrator 2.1.0",
    "Pywikibot",
    "Flask",
    "Python 3.x"
  ]
}