# eScriptorium

> Web platform for manual transcription and automated text recognition of prints (optical character recognition) and manuscripts (handwriting recognition)

**Wikidata**: [Q111218645](https://www.wikidata.org/wiki/Q111218645)  
**Wikipedia**: [English](https://en.wikipedia.org/wiki/EScriptorium)  
**Source**: https://4ort.xyz/entity/escriptorium

## Summary  
eScriptorium is a web‑based platform that lets users manually transcribe printed and handwritten documents while also providing automated optical character recognition (OCR) for prints and handwriting recognition for manuscripts. It is released as source‑available software under the MIT License and runs primarily on macOS.

## Key Facts  
- **Type:** Web application, source‑available software, and software (Wikidata).  
- **Primary functions:** Manual transcription, optical character recognition, and handwriting recognition.  
- **License:** MIT License (source code repository → https://gitlab.com/scripta/escriptorium).  
- **Current stable release:** **v1.0.0** (first stable release with new UI, Kraken 6 support) – released 30 January 2026.  
- **Previous stable release:** **v0.14.0** – released 24 October 2023.  
- **Programming languages:** Python (core logic) and HTML (front‑end).  
- **Operating system:** macOS (required for deployment).  
- **Key dependencies:** Kraken (handwriting recognition engine) and Oodle SDK.  
- **Supported readable file formats:** ALTO‑XML, PAGE‑XML, PDF.  
- **Supported writable file formats:** ALTO‑XML, PAGE‑XML, TEI/XML, plain‑text files.  
- **Repository & community:** Hosted on GitLab under the *scripta* namespace (https://gitlab.com/scripta/escriptorium); Open Hub ID = escriptorium.  

## FAQs  
### Q: What does eScriptorium do?  
A: eScriptorium provides a browser‑based interface for scholars to manually transcribe historical texts and automatically generate machine‑readable text using OCR for printed material and handwriting recognition for manuscript material.  

### Q: Is eScriptium free to use?  
A: Yes. The software is released under the permissive MIT License, making it free to download, modify, and deploy, though it is copyrighted.  

### Q: Which operating systems can run eScriptorium?  
A: The platform is built to run on macOS; it relies on macOS‑specific components such as the Oodle SDK.  

### Q: Which technologies power eScriptorium’s recognition capabilities?  
A: eScriptorium integrates the Kraken engine for handwriting recognition and uses standard OCR techniques for printed text; both are accessed through its Python backend.  

### Q: What file formats can I import and export with eScriptorium?  
A: It can read ALTO‑XML, PAGE‑XML, and PDF files, and it can write ALTO‑XML, PAGE‑XML, TEI/XML, and plain‑text files.  

## Why It Matters  
Digitizing historical prints and manuscripts is essential for preserving cultural heritage and enabling large‑scale textual analysis. eScriptorium streamlines this process by combining manual transcription—crucial for ambiguous or low‑quality sources—with automated OCR and handwriting recognition, dramatically reducing the time scholars spend converting physical documents into searchable, machine‑readable formats. Its open‑source MIT licensing encourages community contributions and customization, while its support for widely used scholarly standards (ALTO‑XML, PAGE‑XML, TEI/XML) ensures interoperability with other digital humanities tools. By running as a web application, eScriptorium lowers the barrier to entry, allowing researchers to work collaboratively through a browser without installing heavyweight desktop software. Consequently, it accelerates the creation of digital corpora, supports reproducible research, and expands access to primary sources for educators, archivists, and the public.  

## Notable For  
- **First‑generation web platform** that unifies manual transcription with both OCR and handwriting recognition in a single interface.  
- **MIT‑licensed source‑available code**, enabling free use, modification, and redistribution.  
- **Integration with Kraken 6**, providing state‑of‑the‑art handwriting recognition for historical manuscripts.  
- **Support for scholarly XML standards** (ALTO‑XML, PAGE‑XML, TEI/XML), facilitating seamless data exchange with other digital‑humanities infrastructures.  
- **Stable, production‑ready releases** (v0.14.0 in 2023 and v1.0.0 in 2026) that include a modern UI and expanded format support.  

## Body  

### Overview  
eScriptorium is a browser‑based transcription environment designed for scholars working with printed books and handwritten manuscripts. Users can upload source images (PDF, ALTO‑XML, PAGE‑XML) and either manually type transcriptions or invoke automated recognition engines.

### Architecture  
- **Backend:** Python application handling file ingestion, OCR/handwriting pipelines, and data storage.  
- **Frontend:** HTML interface rendered in any modern web browser.  
- **Dependencies:**  
  - **Kraken:** Provides deep‑learning‑based handwriting recognition.  
  - **Oodle SDK:** Required library for macOS deployment.  

### Features  
- **Manual transcription:** Rich text editor with support for line‑by‑line alignment to source images.  
- **Automated OCR:** Generates machine‑readable text from printed pages using standard OCR algorithms.  
- **Handwriting recognition:** Leverages Kraken to produce initial transcriptions for manuscript images, which can be corrected manually.  
- **Export options:** Users can download results in ALTO‑XML, PAGE‑XML, TEI/XML, or plain‑text formats.  

### Technical Stack  
- **Programming languages:** Python (core logic), HTML (UI).  
- **Operating system:** macOS (required for Oodle SDK).  
- **Versioning:**  
  - *v0.14.0* – stable release on 24 Oct 2023.  
  - *v1.0.0* – stable release on 30 Jan 2026, introducing a new UI and Kraken 6 support.  

### Licensing & Distribution  
The codebase is hosted on GitLab (`https://gitlab.com/scripta/escriptorium`) under the MIT License, making it free for academic and commercial use while retaining copyright protection.  

### Community & Adoption  
eScriptorium is listed on Open Hub (ID = escriptorium) and has multilingual Wikipedia entries (English, German, French, Commons). Its logo and sample screenshots are publicly available on Wikimedia Commons.  

## Schema Markup  
```json
{
  "@context": "https://schema.org",
  "@type": "WebApplication",
  "name": "eScriptorium",
  "description": "Web platform for manual transcription and automated text recognition of prints (optical character recognition) and manuscripts (handwriting recognition).",
  "url": "https://gitlab.com/scripta/escriptorium",
  "sameAs": [
    "https://en.wikipedia.org/wiki/EScriptorium",
    "https://commons.wikimedia.org/wiki/Category:EScriptorium"
  ],
  "additionalType": "SoftwareApplication"
}

## References

1. [Source](https://gitlab.com/scripta/escriptorium/-/blob/develop/LICENSE)
2. [v0.14.0](https://gitlab.com/scripta/escriptorium/-/tags/v0.14.0)
3. [Release eScriptorium v1.0.0 — first stable release featuring the new UI, Kraken 6 support and other features](https://gitlab.com/scripta/escriptorium/-/releases/v1.0.0)