# evalplus

> Rigourous evaluation of LLM-synthesized code - NeurIPS 2023

**Wikidata**: [Q127474976](https://www.wikidata.org/wiki/Q127474976)  
**Source**: https://4ort.xyz/entity/evalplus

## Summary
EvalPlus is a software framework designed for the rigorous evaluation of code synthesized by Large Language Models (LLMs). It was featured at NeurIPS 2023 and is associated with the research paper "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation." The project is open-source and distributed under the Apache Software License 2.0.

## Key Facts
- **License:** EvalPlus is distributed under the Apache Software License 2.0.
- **Latest Version:** As of late 2024, the stable version is 0.3.1, released on October 20, 2024.
- **Classifications:** It is categorized as both software and free software.
- **Academic Association:** The tool is described by the source "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation" and was presented at NeurIPS 2023.
- **Repository:** The source code is hosted publicly at https://github.com/evalplus/evalplus.
- **Website:** The project maintains an official website at https://evalplus.github.io/.
- **Release History:** The project began stable releases in May 2023 (v0.1.1) and has released multiple updates, including v0.2.0 in November 2023 and v0.2.1 in April 2024.

## FAQs
### Q: What is the primary purpose of EvalPlus?
A: EvalPlus is built to rigorously evaluate code generated by Large Language Models (LLMs). It addresses the specific question of whether code synthesized by AI models like ChatGPT is functionally correct.

### Q: Is EvalPlus free to use?
A: Yes, EvalPlus is classified as free software and is released under the Apache Software License 2.0, allowing users to run, study, change, and distribute it.

### Q: Where can the official source code for EvalPlus be found?
A: The source code repository is hosted on GitHub at https://github.com/evalplus/evalplus.

## Why It Matters
EvalPlus addresses a critical gap in the development and deployment of Large Language Models (LLMs): the verification of code correctness. As LLMs become increasingly capable of generating functional code, standard evaluation benchmarks often fail to detect subtle bugs or functional errors. EvalPlus matters because it provides a "rigorous" framework for this assessment, moving beyond simple syntax checking to deeper evaluation.

The project is significant to the AI research community, evidenced by its association with NeurIPS 2023, a premier conference on neural information processing systems. By providing a standardized, open-source tool under the Apache 2.0 license, EvalPlus enables researchers and developers to reliably benchmark model performance. This contributes to the safety and reliability of AI-assisted software development, ensuring that generated code is not only plausible but actually executable and correct.

## Notable For
- **Academic Recognition:** Being featured at NeurIPS 2023, a top-tier academic conference.
- **Research Integration:** Direct implementation of findings from the paper "Is Your Code Generated by ChatGPT Really Correct?"
- **Open Source Accessibility:** Availability as free software under the permissive Apache 2.0 license.
- **Active Development:** Maintained version history spanning from May 2023 to October 2024 (versions 0.1.1 to 0.3.1).

## Body
### Overview and Purpose
EvalPlus is a software tool classified as free software, designed to rigorously evaluate the quality and correctness of code synthesized by Large Language Models (LLMs). It functions as a benchmarking framework, famously associated with the evaluation of models like ChatGPT. The project's core description is "Rigourous evaluation of LLM-synthesized code - NeurIPS 2023."

### Version History and Development
The software has maintained an active release schedule since its initial stable release in early May 2023.
- **Initial Release:** Version 0.1.1 was released on May 7, 2023, followed closely by patches 0.1.2 (May 7, 2023) and 0.1.3 (May 9, 2023).
- **Mid-2023 Updates:** Version 0.1.5 was released on June 2, 2023, and version 0.1.6 followed on June 26, 2023.
- **Major Updates:** Version 0.2.0 was released on November 24, 2023.
- **Current State:** The most recent stable version listed is 0.3.1, released on October 20, 2024.

### Technical Specifications and Access
EvalPlus is fully open-source.
- **Source Code:** The codebase is available at `https://github.com/evalplus/evalplus`.
- **License:** It uses the Apache Software License 2.0.
- **Documentation:** Further information is available on the project website at `https://evalplus.github.io/`.

### Academic Context
The tool is a direct product of research presented at the Neural Information Processing Systems (NeurIPS) conference in 2023. It serves as the practical implementation of the methodologies described in the paper "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation."

## References

1. [Release 0.1.1. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.1)
2. [Release 0.1.2. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.2)
3. [Release 0.1.3. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.3)
4. [Release 0.1.4. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.4)
5. [Release 0.1.5. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.5)
6. [Release 0.1.6. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.6)
7. [Release 0.1.7. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.1.7)
8. [Release 0.2.0. 2023](https://github.com/evalplus/evalplus/releases/tag/v0.2.0)
9. [Release 0.2.1. 2024](https://github.com/evalplus/evalplus/releases/tag/v0.2.1)
10. [Release 0.3.1. 2024](https://github.com/evalplus/evalplus/releases/tag/v0.3.1)
11. [Source](https://api.github.com/repos/evalplus/evalplus)