alphadl / OOP-evalLinks

The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs

☆24

Alternatives and similar repositories for OOP-eval

Users that are interested in OOP-eval are comparing it to the libraries listed below

Sorting:

fshp971 / mcmc-unlearning
[ICLR 2022] Official repository for "Knowledge Removal in Sampling-based Bayesian Inference"
☆17Updated 3 years ago
kanxueli / llm_benchmarks
☆28Updated 10 months ago
ZiyiZhang27 / tdpo
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
☆35Updated 11 months ago
CausalLearning / llm_benchmarks
☆184Updated last week
tanganke / fusion_bench
FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion
☆143Updated last week
WildVision-AI / LMM-Engines
☆16Updated 8 months ago
QizhiPei / MathFusion
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆25Updated last month
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆57Updated last year
alphadl / R1
🚀enhanced GRPO with more verifiable rewards and real-time evaluators
☆35Updated 2 weeks ago
test-time-interaction / TTI
☆40Updated 2 weeks ago
qiuzh20 / gated_attention
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
☆44Updated last month
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆26Updated 6 months ago
sail-sg / ActivePRM
☆15Updated 2 months ago
sjelassi / transformers_ssm_copy
☆32Updated last year
tianyi-lab / DEBATunE
[ACL'24] Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
☆23Updated 9 months ago
TianduoWang / DPO-ST
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
☆45Updated 10 months ago
qcznlp / uncertainty_attack
☆19Updated 9 months ago
wzq016 / PINE
Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""
☆14Updated last week
wzhouad / WPO
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
☆40Updated 9 months ago
CriticBench / CriticBench
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆25Updated last year
wenzhe-li / Self-MoA
☆17Updated 4 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆32Updated 9 months ago
GAIR-NLP / benbench
Benchmarking Benchmark Leakage in Large Language Models
☆52Updated last year
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆54Updated last year
zzwjames / FailureLLMUnlearning
An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)
☆27Updated 4 months ago
OS-Copilot / ScienceBoard
Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"
☆71Updated this week
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 10 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆45Updated 3 months ago
HKUNLP / ProGen
[EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.
☆26Updated 2 years ago
thu-ml / Noise-Contrastive-Alignment
Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)
☆54Updated 7 months ago