terryyz / llm-benchmark
A list of LLM benchmark frameworks.
☆57Updated 7 months ago
Related projects: ⓘ
- Expert Specialized Fine-Tuning☆129Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- The scripts for MMLU-Pro☆84Updated this week
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- [ICLR 2024] Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding☆138Updated 6 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆55Updated last week
- ☆109Updated last month
- Benchmark baseline for retrieval qa applications☆90Updated 5 months ago
- ☆170Updated last month
- Small and Efficient Mathematical Reasoning LLMs☆69Updated 7 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆96Updated this week
- Data preparation code for Amber 7B LLM☆76Updated 4 months ago
- Evaluating LLMs with fewer examples☆131Updated 5 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆82Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web"☆106Updated last week
- Reformatted Alignment☆111Updated 4 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆99Updated last month
- ☆111Updated 3 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆195Updated 4 months ago
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆125Updated this week
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆133Updated 10 months ago
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆127Updated 3 weeks ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆155Updated 2 months ago
- ☆90Updated last month
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆72Updated 8 months ago
- ☆73Updated 8 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆86Updated 3 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆104Updated 3 months ago
- Evaluation and analysis code for LLM360☆75Updated 3 months ago