kilian-group / phantom-wikiLinks
Python package for generating datasets to evaluate reasoning and retrieval of large language models
☆19Updated this week
Alternatives and similar repositories for phantom-wiki
Users that are interested in phantom-wiki are comparing it to the libraries listed below
Sorting:
- Aioli: A unified optimization framework for language model data mixing☆27Updated 6 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆24Updated 4 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆25Updated 2 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆53Updated last month
- Verifiers for LLM Reinforcement Learning☆69Updated 3 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆49Updated 9 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆44Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 11 months ago
- A repository for research on medium sized language models.☆78Updated last year
- ☆53Updated 9 months ago
- ☆25Updated 2 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated 11 months ago
- ☆46Updated last year
- Minimum Description Length probing for neural network representations☆18Updated 6 months ago
- ☆29Updated this week
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- ☆119Updated 5 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆92Updated 2 months ago
- ☆45Updated 4 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 2 months ago
- CodeUltraFeedback: aligning large language models to coding preferences☆71Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆64Updated last month
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated 4 months ago
- ☆64Updated 2 weeks ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆81Updated 9 months ago
- Lottery Ticket Adaptation☆39Updated 8 months ago