felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆147Updated 11 months ago
Alternatives and similar repositories for tinyBenchmarks:
Users that are interested in tinyBenchmarks are comparing it to the libraries listed below
- ☆156Updated 2 weeks ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆102Updated 5 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆218Updated 4 months ago
- A simple unified framework for evaluating LLMs☆206Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆73Updated last year
- ☆119Updated 5 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- ☆65Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- ☆60Updated 10 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆119Updated 7 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆167Updated last month
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆131Updated 4 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆133Updated last month
- Reproducible, flexible LLM evaluations☆176Updated 3 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆136Updated 4 months ago
- ☆49Updated 2 weeks ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated this week
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆82Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆206Updated 10 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆104Updated 6 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆186Updated 3 months ago
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 4 months ago
- ☆96Updated 8 months ago
- ☆122Updated 4 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 9 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆71Updated 7 months ago
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆126Updated last year