cohere-ai / magikarpLinks
Code for the paper "Fishing for Magikarp"
☆155Updated 3 weeks ago
Alternatives and similar repositories for magikarp
Users that are interested in magikarp are comparing it to the libraries listed below
Sorting:
- Evaluating LLMs with fewer examples☆155Updated last year
- Reproducible, flexible LLM evaluations☆204Updated 3 weeks ago
- Improving Alignment and Robustness with Circuit Breakers☆208Updated 8 months ago
- Replicating O1 inference-time scaling laws☆87Updated 6 months ago
- RuLES: a benchmark for evaluating rule-following in language models☆224Updated 3 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆103Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆223Updated 7 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆126Updated 9 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆76Updated last year
- ☆131Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆225Updated 8 months ago
- A simple unified framework for evaluating LLMs☆215Updated last month
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆181Updated this week
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆92Updated this week
- PyTorch library for Active Fine-Tuning☆79Updated 3 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆106Updated 3 months ago
- The official evaluation suite and dynamic data release for MixEval.☆242Updated 6 months ago
- Code for Zero-Shot Tokenizer Transfer☆128Updated 4 months ago
- ☆97Updated 11 months ago
- The HELMET Benchmark☆149Updated last month
- ☆114Updated 3 months ago
- ☆174Updated last month
- ☆120Updated 8 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆201Updated last month
- Self-Alignment with Principle-Following Reward Models☆161Updated 3 weeks ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆75Updated 6 months ago
- ☆81Updated 7 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆193Updated 6 months ago
- LLM-Merging: Building LLMs Efficiently through Merging☆197Updated 8 months ago