TIGER-AI-Lab / LongICLBenchLinks

Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]

☆105

Alternatives and similar repositories for LongICLBench

Users that are interested in LongICLBench are comparing it to the libraries listed below

Sorting:

chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆127Updated last year
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 8 months ago
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆218Updated 4 months ago
HKUNLP / STRING
[ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"
☆77Updated 8 months ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆146Updated 9 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆139Updated 10 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆58Updated 10 months ago
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆140Updated 8 months ago
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆122Updated 8 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆105Updated 2 months ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆166Updated 2 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆89Updated 8 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆113Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆162Updated 2 months ago
princeton-nlp / HELMET
The HELMET Benchmark
☆161Updated 3 months ago
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆183Updated last year
nightdessert / Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆205Updated last year
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆110Updated 7 months ago
yifanzhang-pro / AutoMathText
Official implementation of ACL 2025 Findings paper "Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Text…
☆84Updated last week
princeton-nlp / AutoCompressors
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
☆309Updated 10 months ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆157Updated last year
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆159Updated last week
zhiyuanhubj / LongRecipe
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
☆76Updated 9 months ago
bigai-nlco / LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
☆184Updated 9 months ago
da03 / implicit_chain_of_thought
☆135Updated 8 months ago
GAIR-NLP / AIME-Preview
☆71Updated 4 months ago
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
GAIR-NLP / ReAlign
Reformatted Alignment
☆113Updated 10 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆63Updated 9 months ago