alonj / Same-Task-More-Tokens
The code for the paper: "Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models"
☆54Updated 6 months ago
Alternatives and similar repositories for Same-Task-More-Tokens:
Users that are interested in Same-Task-More-Tokens are comparing it to the libraries listed below
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆98Updated 6 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆131Updated 3 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆71Updated 7 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆77Updated 5 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- Reproducible, flexible LLM evaluations☆129Updated last month
- Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆69Updated 2 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆210Updated 2 months ago
- ☆64Updated 11 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆91Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆150Updated last month
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆24Updated 8 months ago
- PyTorch building blocks for OLMo☆49Updated this week
- The HELMET Benchmark☆109Updated last week
- ☆59Updated 9 months ago
- Reformatted Alignment☆113Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- ☆114Updated 2 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆41Updated 3 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆69Updated last month
- ☆116Updated 3 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆68Updated last month
- ☆53Updated 3 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆106Updated 6 months ago
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆66Updated 2 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated 10 months ago
- ☆129Updated last month
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆40Updated 6 months ago