# Nougat

> image Transformer encoder model for OCR of scientific papers

**Wikidata**: [Q128801360](https://www.wikidata.org/wiki/Q128801360)  
**Source**: https://4ort.xyz/entity/nougat

## Summary
Nougat is an image Transformer encoder model designed to perform optical character recognition (OCR) on scientific papers. It is a Transformer-based system developed for processing and extracting text and structure from academic documents and is documented in the paper "Nougat: Neural Optical Understanding for Academic Documents."

## Key Facts
- Nougat is an image Transformer encoder model for OCR of scientific papers.
- Instance of: transformer (machine-learning model architecture).
- Use: optical character recognition, with qualifier specifying scientific literature.
- Nougat is based on the Document Understanding Transformer (an OCR-free end-to-end Transformer model).
- Described by the paper titled "Nougat: Neural Optical Understanding for Academic Documents."
- Official project website: https://facebookresearch.github.io/nougat/ (referenced 2024-08-11).
- Model documentation page: https://huggingface.co/docs/transformers/en/model_doc/nougat.
- Source code repository: https://github.com/facebookresearch/nougat (repository qualifiers recorded: P8423: Q186055, P10627: Q364).
- License: MIT License (reference entry dated 2024-08-11).
- Product or material produced: Q1193600.

## FAQs
### Q: What is Nougat used for?
A: Nougat is used for optical character recognition (OCR) specifically targeted at scientific literature and academic documents. It operates as an image Transformer encoder to extract text and document structure from paper images.

### Q: Is Nougat open source?
A: Yes. The Nougat source code is available at https://github.com/facebookresearch/nougat and the project is licensed under the MIT License (reference date 2024-08-11).

### Q: On what architecture is Nougat based?
A: Nougat is based on the Document Understanding Transformer, which is described as an OCR-free end-to-end Transformer model.

### Q: Where can I find documentation for Nougat?
A: Model documentation is available on Hugging Face at https://huggingface.co/docs/transformers/en/model_doc/nougat and the project website is https://facebookresearch.github.io/nougat/.

## Why It Matters
Nougat addresses the specialized problem of extracting text and structure from scientific papers, which contain dense layouts, figures, formulas, and domain-specific formatting that challenge general-purpose OCR systems. By using an image Transformer encoder tailored to academic documents, Nougat provides a model architecture focused on reading and understanding document images common in scholarly communication. Its lineage from the Document Understanding Transformer connects it to OCR-free, end-to-end approaches that aim to reduce pipeline complexity and integrate layout and visual signals directly into text extraction. Availability of the code under an MIT License and accompanying documentation on both a project website and Hugging Face makes Nougat accessible for researchers and engineers who need OCR tools tuned for scientific literature. For workflows in digital libraries, literature mining, and automated extraction of research content, Nougat serves as a purpose-built component to improve reliability and integration compared with generic OCR tools.

## Notable For
- Being an image Transformer encoder explicitly built for OCR of scientific and academic documents.
- Its architectural lineage: based on the Document Understanding Transformer (an OCR-free end-to-end Transformer model).
- Public availability of source code at https://github.com/facebookresearch/nougat under an MIT License (reference 2024-08-11).
- Formal description in the paper "Nougat: Neural Optical Understanding for Academic Documents" and documentation hosted on Hugging Face.

## Body

### Overview
- Nougat is presented as an image Transformer encoder model.
- The primary application is optical character recognition on scientific literature.
- The model is documented in the paper "Nougat: Neural Optical Understanding for Academic Documents."

### Architecture and Lineage
- Nougat is an instance of the Transformer model class.
- It is based on the Document Understanding Transformer.
- The Document Understanding Transformer is described as an OCR-free end-to-end Transformer model, which is the stated basis for Nougat.

### Use and Applications
- Nominal use: optical character recognition.
- Qualifier on use: specifically for scientific literature.
- Targeted application domains include academic papers and documents with complex layouts.

### Licensing and Distribution
- License: MIT License (license reference dated 2024-08-11).
- Source code repository: https://github.com/facebookresearch/nougat.
  - Repository qualifiers recorded: P8423: Q186055 and P10627: Q364.
- Project website: https://facebookresearch.github.io/nougat/ (referenced 2024-08-11).
- Model documentation hosted at: https://huggingface.co/docs/transformers/en/model_doc/nougat.

### References and Identifiers
- Described-by source (paper): "Nougat: Neural Optical Understanding for Academic Documents."
- Described-at URL (documentation): https://huggingface.co/docs/transformers/en/model_doc/nougat.
- Product or material produced property: Q1193600.

## References

1. [Source](https://api.github.com/repos/facebookresearch/nougat)