alphadl / OOP-evalLinks
The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs
☆24Updated 6 months ago
Alternatives and similar repositories for OOP-eval
Users that are interested in OOP-eval are comparing it to the libraries listed below
Sorting:
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆27Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆62Updated 10 months ago
- FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion☆149Updated 2 weeks ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated last year
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Updated 2 years ago
- Self-Supervised Alignment with Mutual Information☆21Updated last year
- [ICLR 2022] Official repository for "Knowledge Removal in Sampling-based Bayesian Inference"☆17Updated 3 years ago
- 🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context☆15Updated 3 months ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆29Updated last year
- Evaluate the Quality of Critique☆36Updated last year
- ☆30Updated 7 months ago
- Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication☆21Updated last year
- [ICML 2024] Self-Infilling Code Generation☆18Updated last year
- Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…☆20Updated 8 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆48Updated last year
- Directional Preference Alignment☆59Updated 10 months ago
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆37Updated last year
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆41Updated 10 months ago
- The repository for ACL 2024 paper "TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models"☆31Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated 9 months ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆21Updated 5 months ago
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …☆24Updated last year
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- 🚀enhanced GRPO with more verifiable rewards and real-time evaluators☆37Updated 2 months ago
- ☆125Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 3 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆77Updated 8 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆108Updated 2 years ago