booydar / babilongLinks
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
☆212Updated last month
Alternatives and similar repositories for babilong
Users that are interested in babilong are comparing it to the libraries listed below
Sorting:
- The HELMET Benchmark☆172Updated last month
- A simple unified framework for evaluating LLMs☆248Updated 5 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆274Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆206Updated last year
- LOFT: A 1 Million+ Token Long-Context Benchmark☆212Updated 3 months ago
- The official evaluation suite and dynamic data release for MixEval.☆249Updated 10 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆216Updated 2 months ago
- ☆192Updated 5 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆241Updated 11 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)