felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆131Updated 5 months ago
Related projects: ⓘ
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- Code accompanying "How I learned to start worrying about prompt formatting".☆82Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web"☆106Updated this week
- A simple unified framework for evaluating LLMs☆121Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆105Updated last year
- ☆77Updated 3 weeks ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆59Updated 10 months ago
- ☆105Updated this week
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆118Updated 6 months ago
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆195Updated 3 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆74Updated last month
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated 11 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆130Updated 2 months ago
- 🚢 Data Toolkit for Sailor Language Models☆74Updated 2 months ago
- Evaluating LLMs with CommonGen-Lite☆83Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆39Updated 2 weeks ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆127Updated 2 weeks ago
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆133Updated 10 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆73Updated 6 months ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆201Updated 10 months ago
- ☆118Updated 5 months ago
- Self-Alignment with Principle-Following Reward Models☆144Updated 6 months ago
- Code repository for the c-BTM paper☆105Updated 11 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆218Updated 5 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆217Updated 2 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆104Updated 3 months ago