jszheng21 / RACE
RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.
☆10Updated 5 months ago
Alternatives and similar repositories for RACE:
Users that are interested in RACE are comparing it to the libraries listed below
- 🩺 A collection of ChatGPT evaluation reports on various bechmarks.☆48Updated 2 years ago
- Code of "Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model"☆22Updated 9 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆46Updated 3 months ago
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆38Updated 6 months ago
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆24Updated last year
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆30Updated last year
- Towards Systematic Measurement for Long Text Quality☆34Updated 6 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆24Updated last year
- Code for embedding and retrieval research.☆16Updated last year
- ☆15Updated last year
- ☆29Updated 3 months ago
- Code for "RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆17Updated 2 weeks ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning☆36Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Updated last year
- Repo for outstanding paper@ACL 2023 "Do PLMs Know and Understand Ontological Knowledge?"☆31Updated last year
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆22Updated 4 months ago
- ☆16Updated last month
- ☆31Updated last year
- The implementation for our paper, "Improving Simultaneous Machine Translation with Monolingual Data," accepted to AAAI 2023. 🎉☆13Updated last year
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆28Updated 8 months ago
- [COLM'24] "How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?"☆21Updated 5 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated 2 years ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆47Updated 5 months ago
- [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models☆19Updated 2 years ago
- ☆16Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Updated last year
- Complexity Based Prompting for Multi-Step Reasoning☆17Updated 2 years ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆64Updated last year
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆12Updated last year