cohere-ai / magikarpLinks
Code for the paper "Fishing for Magikarp"
☆165Updated 4 months ago
Alternatives and similar repositories for magikarp
Users that are interested in magikarp are comparing it to the libraries listed below
Sorting:
- Evaluating LLMs with fewer examples☆161Updated last year
- RuLES: a benchmark for evaluating rule-following in language models☆232Updated 6 months ago
- Code for Zero-Shot Tokenizer Transfer☆137Updated 8 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 11 months ago
- ☆100Updated last year
- Replicating O1 inference-time scaling laws☆90Updated 9 months ago
- ☆142Updated last week
- The official evaluation suite and dynamic data release for MixEval.☆245Updated 10 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆92Updated 10 months ago
- ☆72Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆271Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆241Updated 10 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆128Updated 2 months ago
- Language models scale reliably with over-training and on downstream tasks☆99Updated last year
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆130Updated last year
- Extract full next-token probabilities via language model APIs☆248Updated last year
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆309Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated last year
- ☆81Updated 2 weeks ago
- ☆75Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 7 months ago
- Code repository for the c-BTM paper☆107Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆229Updated 2 months ago
- ☆127Updated 11 months ago
- PyTorch library for Active Fine-Tuning☆91Updated last week
- A simple unified framework for evaluating LLMs☆245Updated 5 months ago
- 🚢 Data Toolkit for Sailor Language Models☆94Updated 6 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆211Updated 2 weeks ago
- ☆190Updated 5 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆201Updated last year