alphadl / OOP-eval
The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs
☆24Updated last month
Alternatives and similar repositories for OOP-eval:
Users that are interested in OOP-eval are comparing it to the libraries listed below
- FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion☆113Updated this week
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆34Updated 7 months ago
- ☆68Updated last week
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆23Updated 11 months ago
- Evaluate the Quality of Critique☆35Updated 9 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated 9 months ago
- Augmenting Statistical Models with Natural Language Parameters☆22Updated 5 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆25Updated last week
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆39Updated 3 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆59Updated 3 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆44Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆89Updated 9 months ago
- ☆26Updated 7 months ago
- ☆44Updated 6 months ago
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"☆32Updated 5 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆46Updated last year
- AbstainQA, ACL 2024☆25Updated 4 months ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆77Updated 10 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆33Updated last month
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆22Updated 2 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆50Updated 8 months ago
- ☆20Updated 7 months ago
- code for "Natural Language to Code Translation with Execution"☆40Updated 2 years ago
- ☆30Updated 10 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆121Updated 7 months ago
- Codebase for Inference-Time Policy Adapters☆23Updated last year
- ☆41Updated last year