google-deepmind / loft
LOFT: A 1 Million+ Token Long-Context Benchmark
☆172Updated 3 months ago
Alternatives and similar repositories for loft:
Users that are interested in loft are comparing it to the libraries listed below
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆215Updated 3 months ago
- A simple unified framework for evaluating LLMs☆197Updated 2 weeks ago
- ☆130Updated 2 months ago
- Reproducible, flexible LLM evaluations☆162Updated 2 months ago
- Self-Alignment with Principle-Following Reward Models☆154Updated 11 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆175Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆154Updated 2 months ago
- ☆117Updated 4 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆128Updated 3 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆111Updated 3 months ago
- The HELMET Benchmark☆115Updated this week
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆158Updated this week
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 9 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆194Updated last week
- The official evaluation suite and dynamic data release for MixEval.☆231Updated 3 months ago
- ☆66Updated last year
- ☆150Updated last year
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆145Updated 2 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 5 months ago
- Comprehensive benchmark for RAG☆116Updated 3 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆144Updated 9 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆146Updated 2 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆103Updated last week
- ☆149Updated 2 weeks ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆100Updated 7 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆173Updated 6 months ago
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆293Updated 5 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆77Updated 6 months ago