cohere-ai / magikarp
☆121Updated last month
Related projects: ⓘ
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- ☆38Updated 5 months ago
- Code for Zero-Shot Tokenizer Transfer☆109Updated 2 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- ☆77Updated last month
- A simple unified framework for evaluating LLMs☆121Updated this week
- Code repository for the c-BTM paper☆105Updated 11 months ago
- Language models scale reliably with over-training and on downstream tasks☆91Updated 5 months ago
- Evaluating LLMs with fewer examples☆131Updated 5 months ago
- ☆105Updated this week
- Manage scalable open LLM inference endpoints in Slurm clusters☆217Updated 2 months ago
- ☆91Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web"☆106Updated last week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆76Updated 3 weeks ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆141Updated 4 months ago
- ☆87Updated 2 months ago
- Self-Alignment with Principle-Following Reward Models☆144Updated 6 months ago
- RuLES: a benchmark for evaluating rule-following in language models☆209Updated this week
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- Steering vectors for transformer language models in Pytorch / Huggingface☆52Updated 2 months ago
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago
- Discovering Data-driven Hypotheses in the Wild☆31Updated 3 weeks ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆105Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆82Updated 6 months ago
- PASTA: Post-hoc Attention Steering for LLMs☆96Updated last week
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated 11 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆170Updated last month
- Code accompanying "How I learned to start worrying about prompt formatting".☆82Updated last month
- ☆118Updated 5 months ago