open-compass / Ada-LEvalLinks

The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

☆54

Alternatives and similar repositories for Ada-LEval

Users that are interested in Ada-LEval are comparing it to the libraries listed below

Sorting:

clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆122Updated last year
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆41Updated 7 months ago
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆75Updated 4 months ago
Quehry / HelloBench
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
☆52Updated 11 months ago
DRSY / EasyKV
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
☆63Updated last year
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆108Updated 8 months ago
GAIR-NLP / AIME-Preview
☆73Updated 7 months ago
open-compass / CriticEval
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
☆47Updated 10 months ago
hewei2001 / ReachQA
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
☆56Updated 2 months ago
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆186Updated 4 months ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆148Updated 11 months ago
OFA-Sys / DiverseEvol
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
☆85Updated last year
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆82Updated 7 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆142Updated last year
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆73Updated 11 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆177Updated 3 months ago
GAIR-NLP / ReAlign
Reformatted Alignment
☆112Updated last year
dvlab-research / Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
☆55Updated last year
voidism / Lookback-Lens
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
☆135Updated last week
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆178Updated 3 months ago
hkust-nlp / PreSelect
[ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches
☆56Updated 7 months ago
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆145Updated 4 months ago
ReasoningTransfer / Transferability-of-LLM-Reasoning
☆99Updated 2 weeks ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆62Updated last year
nick7nlp / FastCuRL
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning
☆53Updated 2 weeks ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆111Updated 9 months ago
GAIR-NLP / OlympicArena
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆105Updated 7 months ago
open-compass / CompassVerifier
[EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
☆49Updated 2 months ago
ConiferLM / Conifer
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
☆88Updated last year