cohere-ai / magikarpLinks
Code for the paper "Fishing for Magikarp"
☆157Updated last month
Alternatives and similar repositories for magikarp
Users that are interested in magikarp are comparing it to the libraries listed below
Sorting:
- Evaluating LLMs with fewer examples☆158Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆214Updated 9 months ago
- ☆97Updated last year
- Replicating O1 inference-time scaling laws☆87Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆87Updated 8 months ago
- Reproducible, flexible LLM evaluations☆214Updated last month
- ☆115Updated 4 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆185Updated last week
- Code for Zero-Shot Tokenizer Transfer☆133Updated 5 months ago
- ☆86Updated 7 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆137Updated 7 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆233Updated 2 weeks ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 4 months ago
- ☆181Updated 2 months ago
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆108Updated 4 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆95Updated 3 weeks ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆78Updated last year
- ☆134Updated 2 months ago
- PASTA: Post-hoc Attention Steering for LLMs☆118Updated 7 months ago
- The official evaluation suite and dynamic data release for MixEval.☆242Updated 7 months ago
- ☆27Updated last week
- Experiments on speculative sampling with Llama models☆128Updated 2 years ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆226Updated 7 months ago
- The HELMET Benchmark☆154Updated 2 months ago
- ☆48Updated last month
- Manage scalable open LLM inference endpoints in Slurm clusters☆261Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- PyTorch library for Active Fine-Tuning☆80Updated 4 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆89Updated 7 months ago