alphadl / OOP-evalLinks
The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs
☆24Updated 9 months ago
Alternatives and similar repositories for OOP-eval
Users that are interested in OOP-eval are comparing it to the libraries listed below
Sorting:
- FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion☆172Updated this week
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆27Updated last year
- [ICLR 2022] Official repository for "Knowledge Removal in Sampling-based Bayesian Inference"☆18Updated 3 years ago
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆39Updated last year
- Code for "Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal" (ACL 2024)☆15Updated last year
- 🚀enhanced GRPO with more verifiable rewards and real-time evaluators☆37Updated 4 months ago
- ☆14Updated 3 months ago
- ☆25Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated 4 months ago
- Codebase for Hyperdecoders https://arxiv.org/abs/2203.08304☆13Updated 3 years ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆85Updated 5 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- ☆18Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Updated last year
- ☆103Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆63Updated last year
- ☆25Updated 7 months ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆111Updated 2 years ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆62Updated 2 years ago
- ☆17Updated 7 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 10 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated last year
- ☆14Updated last year
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning☆22Updated this week
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw