LivingFutureLab/DeltaBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LivingFutureLab/DeltaBench)

LivingFutureLab / DeltaBench

☆45

Alternatives and similar repositories for DeltaBench

Users that are interested in DeltaBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LivingFutureLab / ChineseSimpleQA
View on GitHub
☆79Jan 24, 2025Updated last year
HenryZhen97 / Reconsidering-Overthinking
View on GitHub
Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning
☆23Jun 23, 2026Updated 3 weeks ago
s-ball-10 / jailbreak_dynamics
View on GitHub
☆25Jun 13, 2024Updated 2 years ago
ssmisya / PRMBench
View on GitHub
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆93Feb 15, 2025Updated last year
DISL-Lab / BalanceMix
View on GitHub
☆15Dec 12, 2023Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
DripNowhy / Sherlock
View on GitHub
[NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"
☆31Jun 4, 2026Updated last month
Small-Model-Gap / Small-Model-Learnability-Gap
View on GitHub
☆23Oct 10, 2025Updated 9 months ago
xwjim / SIEF
View on GitHub
PyTorch implementation for NAACL 2022 paper: "Document-Level Relation Extraction with Sentences Importance Estimation and Focusing"
☆17Apr 29, 2022Updated 4 years ago
sail-sg / ActivePRM
View on GitHub
☆21Apr 16, 2025Updated last year
marketdesignresearch / NOMU
View on GitHub
NOMU: Neural Optimization-based Model Uncertainty
☆10Feb 17, 2023Updated 3 years ago
sastpg / CoVo
View on GitHub
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
☆25Jun 25, 2025Updated last year
MasterVito / SwS
View on GitHub
Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning
☆41Nov 11, 2025Updated 8 months ago
RyannDaGreat / rp
View on GitHub
This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires python≥3.5
☆13Jul 13, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tianyi-lab / MiP-Overthinking
View on GitHub
[COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
☆39Jun 5, 2025Updated last year
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
Guinan-Su / auto-merge-llm
View on GitHub
An official repository for GPTailor
☆18Jun 29, 2025Updated last year
scaleapi / plansearch
View on GitHub
e
☆42Apr 23, 2025Updated last year
baixianghuang / survey-authorship
View on GitHub
Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…
☆19May 25, 2026Updated last month
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated last year
Ayame1006 / LLMtoGraph
View on GitHub
☆10Aug 24, 2023Updated 2 years ago
Bollegala / DARep
View on GitHub
Cross-domain word representation learning
☆10May 23, 2015Updated 11 years ago
NuoJohnChen / JudgeLRM
View on GitHub
JudgeLRM: Large Reasoning Models as a Judge
☆42May 6, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
zongqianwu / ST-COT
View on GitHub
(ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training
☆13Feb 15, 2025Updated last year
Zhenwen-NLP / MathChat
View on GitHub
Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…
☆22Jun 3, 2024Updated 2 years ago
d223302 / TRACT
View on GitHub
☆24Mar 21, 2025Updated last year
DISL-Lab / FineSurE-ACL24
View on GitHub
The official repo of FineSure (ACL-2024)
☆36Jul 8, 2024Updated 2 years ago
microsoft / GUI-Agent-RL
View on GitHub
☆42Jul 2, 2026Updated 2 weeks ago
byronBBL / Context-DPO
View on GitHub
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆23Feb 17, 2025Updated last year
declare-lab / safety-arithmetic
View on GitHub
☆13Jan 14, 2025Updated last year
yaolu-zjut / DDInterpreter
View on GitHub
☆15May 28, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
multimodal-art-projection / CodeCriticBench
View on GitHub
☆16Nov 1, 2025Updated 8 months ago
SeanLeng1 / Reward-Calibration
View on GitHub
☆21Dec 14, 2024Updated last year
FrankYang-17 / Mavors
View on GitHub
☆16May 30, 2025Updated last year
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 2 months ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
THUDM / Self-Contrast
View on GitHub
Extensive Self-Contrast Enables Feedback-Free Language Model Alignment
☆20Apr 2, 2024Updated 2 years ago
ByebyeMonica / Reasoning-Agentic-RAG
View on GitHub
Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
☆30May 14, 2025Updated last year