felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆134Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for tinyBenchmarks
- Code accompanying "How I learned to start worrying about prompt formatting".☆95Updated last month
- ☆102Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆195Updated 2 weeks ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆103Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated last week
- ☆49Updated 6 months ago
- A simple unified framework for evaluating LLMs☆145Updated last week
- 🚢 Data Toolkit for Sailor Language Models☆82Updated 4 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆123Updated last month
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- ☆112Updated last month
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- code for training & evaluating Contextual Document Embedding models☆117Updated this week
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆74Updated 10 months ago
- Code repository for the c-BTM paper☆105Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆146Updated 3 weeks ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆199Updated 6 months ago
- ☆46Updated 2 weeks ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆144Updated last month
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆139Updated this week
- A Survey on Data Selection for Language Models☆182Updated last month
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated 3 weeks ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- ☆68Updated 3 months ago
- A pipeline for LLM knowledge distillation☆78Updated 3 months ago