Rachum-thu / LongPiBenchLinks
The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"
☆12Updated 5 months ago
Alternatives and similar repositories for LongPiBench
Users that are interested in LongPiBench are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆89Updated last week
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆91Updated 2 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 3 months ago
- Reinforcing General Reasoning without Verifiers☆51Updated last week
- ☆116Updated last month
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆35Updated last month
- The code implementation of Symbolic-MoE☆31Updated 2 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆33Updated 8 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆114Updated last year
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆45Updated 6 months ago
- ☆30Updated 3 weeks ago
- ☆49Updated 3 weeks ago
- This is the repository for NAACL'25 paper "TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning"☆53Updated last month
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 7 months ago
- The official implementation of Preference Data Reward-Augmentation.☆17Updated last month
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆54Updated 3 months ago
- ☆19Updated this week
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- The first dense retrieval model that can be prompted like an LM☆73Updated 3 weeks ago
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆21Updated 3 months ago
- The official repo for the code and data of paper SMART☆26Updated 3 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- ☆13Updated 5 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- ☆65Updated 2 months ago
- ☆42Updated 2 months ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Updated last year
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆27Updated last week
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆105Updated 7 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 2 weeks ago