alphadl / OOP-eval
The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs
☆24Updated 3 months ago
Alternatives and similar repositories for OOP-eval:
Users that are interested in OOP-eval are comparing it to the libraries listed below
- FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion☆121Updated this week
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆35Updated 9 months ago
- [ICLR 2022] Official repository for "Knowledge Removal in Sampling-based Bayesian Inference"☆17Updated 3 years ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆53Updated last year
- ☆16Updated 6 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆24Updated last year
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated last week
- [WMT 2022 champion system] Vega-MT model and inference scripts☆41Updated 2 years ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 5 months ago
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Updated last year
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆26Updated 2 years ago
- ☆27Updated 9 months ago
- Code for "Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal" (ACL 2024)☆13Updated 6 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated 11 months ago
- ☆25Updated 2 years ago
- Self-Supervised Alignment with Mutual Information☆17Updated 11 months ago
- Evaluate the Quality of Critique☆34Updated 10 months ago
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆47Updated 6 months ago
- Tasks for describing differences between text distributions.☆16Updated 8 months ago
- Codebase for Inference-Time Policy Adapters☆23Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆59Updated 6 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆47Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆80Updated 8 months ago
- ☆33Updated last year
- 🎁[ChatGPT4NLU] A Comparative Study on ChatGPT and Fine-tuned BERT☆194Updated 2 years ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 9 months ago
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks☆26Updated 7 months ago
- ☆35Updated last month
- ☆40Updated 5 months ago