The first Object-Oriented Programming (OOP) Evaluation Benchmark for LLMs
☆27Jan 15, 2025Updated last year
Alternatives and similar repositories for OOP-eval
Users that are interested in OOP-eval are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆38Jul 12, 2024Updated last year
- ☆115Sep 12, 2024Updated last year
- ☆114Sep 12, 2024Updated last year
- Code and data for automatic paraphrase dataset augmentation.☆11Mar 8, 2021Updated 4 years ago
- FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion☆205Feb 6, 2026Updated 3 weeks ago
- ☆18Apr 15, 2024Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆37Feb 22, 2025Updated last year
- My daily arxiv reading note☆30Nov 10, 2021Updated 4 years ago
- An Intellij Plugin that generates unit test methods with meaningful names based in described behaviours with @should tags in methods ja…☆10Dec 14, 2025Updated 2 months ago
- CodeMind is a generic framework for evaluating inductive code reasoning of LLMs. It is equipped with a static analysis component that ena…☆42Feb 18, 2026Updated last week
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- [MQM-APE] Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators.☆11Sep 24, 2024Updated last year
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- Code for the paper "A Boolean Task Algebra For Reinforcement Learning"☆11Dec 8, 2022Updated 3 years ago
- ☆11Jul 20, 2021Updated 4 years ago
- ☆12Jan 15, 2015Updated 11 years ago
- [EMNLP 2023] Question Answering as Programming for Solving Time-Sensitive Questions☆12Dec 18, 2023Updated 2 years ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- dpo算法实现☆51Jun 12, 2024Updated last year
- Objective-C/C/C++ for Quartz Composer (on the fly)☆17Dec 6, 2012Updated 13 years ago
- Binary code size profiler for WebAssembly☆13Aug 11, 2022Updated 3 years ago
- Git history navigation for dedicated methods, across all kinds of changes incl. complex refactorings.☆43Feb 1, 2024Updated 2 years ago
- Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models☆12Jun 21, 2024Updated last year
- Code for paper "Towards Efficient Pareto Set Approximation via Weight-Ensembling Mixture of Experts"☆11Sep 13, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆14Dec 2, 2021Updated 4 years ago
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- ☆12Nov 5, 2024Updated last year
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))☆13Nov 21, 2021Updated 4 years ago
- The Python solutions of leetcode☆13Apr 26, 2020Updated 5 years ago
- ☆14Aug 21, 2020Updated 5 years ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10May 6, 2024Updated last year
- This project provides several implementations for commit untangling and proposes a new representation of git patches by projecting the pa…☆11Jul 28, 2025Updated 7 months ago
- Artifact repository for the paper "Perfect Is the Enemy of Test Oracle", In Proceedings of The 30th ACM Joint European Software Engineeri…☆11May 4, 2023Updated 2 years ago