alphadl / OOP-eval
The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs
☆24Updated 4 months ago
Alternatives and similar repositories for OOP-eval
Users that are interested in OOP-eval are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆35Updated 10 months ago
- FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion☆128Updated last week
- ☆22Updated 2 months ago
- ☆27Updated 8 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆24Updated last year
- [arXiv] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆18Updated 2 months ago
- Mosaic IT: Enhancing Instruction Tuning with Data Mosaics☆18Updated 3 months ago
- ☆14Updated last year
- [ICML2024]Adaptive decoding balances the diversity and coherence of open-ended text generation.☆16Updated 11 months ago
- Code for "Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal" (ACL 2024)☆13Updated 6 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆66Updated 6 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated last month
- [ICLR 2022] Official repository for "Knowledge Removal in Sampling-based Bayesian Inference"☆17Updated 3 years ago
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆32Updated 11 months ago
- Supporting code for ReCEval paper☆28Updated 8 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- ☆36Updated 2 months ago
- ☆29Updated last year
- [ICML 2024] Self-Infilling Code Generation☆19Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated 9 months ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆26Updated 11 months ago
- ☆24Updated 6 months ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆29Updated 11 months ago
- Codebase for Inference-Time Policy Adapters☆23Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆61Updated 10 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆55Updated 2 months ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆25Updated 8 months ago
- The official repository of paper "AdaR1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆10Updated 2 weeks ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆60Updated 7 months ago
- ☆28Updated 10 months ago