# Arantza Diaz de Ilarraza Sanchez

> Basque computer scientist

**Wikidata**: [Q12253843](https://www.wikidata.org/wiki/Q12253843)  
**Wikipedia**: [English](https://en.wikipedia.org/wiki/Arantza_Díaz_de_Ilarraza_Sánchez)  
**Source**: https://4ort.xyz/entity/arantza-diaz-de-ilarraza-sanchez

## Summary  
Arantza Díaz de Ilarraza Sánchez is a Basque computer scientist, university professor, and researcher born in 1957 in San Sebastián, Spain. She is best known for leading the development of key Basque‑language natural‑language‑processing tools such as the Xuxen spelling checker and the Matxin machine‑translation system, and for her long‑standing role at the University of the Basque Country.

## Biography  
- **Born:** 18 April 1957, San Sebastián, Spain  
- **Nationality:** Spanish (Basque)  
- **Education:** University of the Basque Country, Faculty of Informatics of San Sebastián (doctoral advisor – Felisa Verdejo)  
- **Known for:** Founding and directing major Basque NLP projects (Xuxen, Matxin, EDBL, EPEC treebank) and fostering the Ixa research group.  
- **Employer(s):** University of the Basque Country, Faculty of Informatics of San Sebastián; member of Udako Euskal Unibertsitatea, Ixa Group, Spanish Society for Natural Language Processing, Erasmus Mundus programme.  
- **Field(s):** Natural language processing, computational linguistics, open‑source software for the Basque language  

## Contributions  
Arantza Díaz de Ilarraza has been a pivotal figure in Basque language technology. She co‑created **Xuxen** (1994), an open‑source Basque spelling checker that became the de‑facto standard for Basque word‑processing. She later led the development of **Matxin**, a statistical machine‑translation system that enabled the first large‑scale Basque‑to‑Spanish and Spanish‑to‑Basque translations, dramatically expanding digital access to Basque texts. As a core member of the **Ixa Group**, she coordinated the production of the **EPEC treebank**, a richly annotated corpus that underpins modern Basque syntactic research. She also contributed to the **EDBL** lexical database and the **elia.eus** portal, both of which provide essential linguistic resources for researchers and developers. In 2019 she helped establish **HiTZ Zentroa**, a research centre that consolidates Basque language technology efforts. Her work has resulted in dozens of peer‑reviewed publications, a suite of widely used open‑source tools, and the training of a generation of Basque NLP scholars, many of whom are now leading researchers themselves.

## FAQs  
### Q: What is Arantza Díaz de Ilarraza best known for?  
A: She is best known for creating foundational Basque language tools such as the Xuxen spelling checker and the Matxin machine‑translation system, and for leading the Ixa research group.  

### Q: Which university does she work for?  
A: She is a professor and researcher at the University of the Basque Country, Faculty of Informatics of San Sebastián.  

### Q: Has she received any notable awards?  
A: Yes, she is a recipient of the **Abadia award**, recognizing her contributions to Basque language technology.  

### Q: Who were some of her doctoral students?  
A: Her doctoral students include Eneko Agirre, Xabier Arregi Iparragirre, Bertol Arrieta, Montse Maritxalar Anglada, Aingeru Mayor, Gorka Labaka, Maite Oronoz Antxordoki, among others.  

### Q: What languages does she work with?  
A: She works primarily with Basque, but also uses Spanish, English, and French in her research and collaborations.  

## Why They Matter  
Arantza Díaz de Ilarraza’s work transformed the digital landscape of the Basque language. Before her contributions, Basque lacked robust computational tools, limiting its presence on the web and in modern software. By delivering open‑source resources such as Xuxen and Matxin, she enabled everyday users, educators, and developers to write, proofread, and translate Basque content with unprecedented ease. The linguistic corpora and lexical databases she helped build underpin current research in syntax, semantics, and machine translation, influencing both academic studies and commercial applications. Her mentorship has produced a cadre of researchers who continue to expand Basque NLP, ensuring the language’s vitality in the digital age. Without her pioneering efforts, Basque would have remained marginal in computational linguistics, and many contemporary multilingual technologies would lack Basque support.  

## Notable For  
- Creation of **Xuxen** (1994), the first widely adopted Basque spelling checker.  
- Leadership of the **Matxin** machine‑translation project, enabling Basque‑Spanish translation at scale.  
- Founding member of the **Ixa Group**, responsible for the EPEC treebank and numerous Basque NLP resources.  
- Establishment of **HiTZ Zentroa** (2019), a hub for Basque language technology research.  
- Recipient of the **Abadia award** for outstanding contributions to Basque computational linguistics.  

## Body  

### Early Life and Education  
Arantza Díaz de Ilarraza Sánchez was born on 18 April 1957 in San Sebastián, Spain. She pursued her higher education at the University of the Basque Country, Faculty of Informatics of San Sebastián, where she completed her doctorate under the supervision of Felisa Verdejo, a noted Spanish computer scientist.  

### Academic Career  
Since completing her doctorate, Díaz de Ilarraza has remained at the University of the Basque Country, progressing from researcher to full professor in the Faculty of Informatics. She has held a permanent position as a university teacher and has been a driving force behind several collaborative research initiatives, including the **Ixa Group** and the **Spanish Society for Natural Language Processing**.  

### Major Projects  

- **Xuxen (1994)** – An open‑source Basque spelling checker that integrates with major word processors and is distributed under a free software license.  
- **Matxin** – A statistical machine‑translation system that provides Basque‑Spanish and Spanish‑Basque translation, facilitating cross‑lingual communication and content creation.  
- **EPEC Treebank** – A syntactically annotated corpus of Basque sentences, essential for training and evaluating parsing algorithms.  
- **EDBL** – A lexical database that supplies morphological and semantic information for Basque words.  
- **HiTZ Zentroa (2019)** – A research centre co‑founded by Díaz de Ilarraza to centralize Basque language technology efforts, fostering interdisciplinary projects and industry partnerships.  

### Publications and Mentorship  
Her scholarly output includes dozens of peer‑reviewed articles in computational linguistics and natural language processing journals. She has supervised a notable list of doctoral students, many of whom now hold prominent research positions in Europe and beyond.  

### Awards and Recognition  
In recognition of her impact on Basque language technology, Díaz de Ilarraza received the **Abadia award**, highlighting her role in advancing open‑source software and linguistic resources for minority languages.  

### Professional Memberships  
- **Udako Euskal Unibertsitatea** (Basque University Association)  
- **Ixa Group** (research consortium)  
- **Spanish Society for Natural Language Processing**  
- **Erasmus Mundus programme** (European academic mobility initiative)  

## Schema Markup  
```json
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Arantza Díaz de Ilarraza Sánchez",
  "jobTitle": "Computer scientist, university teacher, researcher",
  "worksFor": {
    "@type": "Organization",
    "name": "University of the Basque Country, Faculty of Informatics of San Sebastián"
  },
  "nationality": {
    "@type": "Country",
    "name": "Spain"
  },
  "birthDate": "1957-04-18",
  "birthPlace": "San Sebastián, Spain",
  "alumniOf": [
    {
      "@type": "EducationalOrganization",
      "name": "University of the Basque Country, Faculty of Informatics of San Sebastián"
    }
  ],
  "knowsAbout": [
    "Natural language processing",
    "Computational linguistics",
    "Open-source software"
  ],
  "sameAs": [],
  "description": "Basque computer scientist known for creating the Xuxen spelling checker and leading the Matxin machine‑translation system."
}

## References

1. datos.bne.es
2. Virtual International Authority File
3. CiNii Research
4. [Source](https://lingualibre.fr/wiki/Q214945)
5. National Library of Israel Names and Subjects Authority File