google-deepmind / loftLinks
LOFT: A 1 Million+ Token Long-Context Benchmark
☆207Updated last month
Alternatives and similar repositories for loft
Users that are interested in loft are comparing it to the libraries listed below
Sorting:
- The HELMET Benchmark☆162Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆233Updated 9 months ago
- Reproducible, flexible LLM evaluations☆227Updated 3 weeks ago
- A simple unified framework for evaluating LLMs☆235Updated 3 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆146Updated 9 months ago
- ☆91Updated 9 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 5 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆112Updated 2 weeks ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆207Updated 2 months ago