wschella / llm-reliabilityLinks
Code for the paper "Larger and more instructable language models become less reliable"
☆31Updated last year
Alternatives and similar repositories for llm-reliability
Users that are interested in llm-reliability are comparing it to the libraries listed below
Sorting:
- Discovering Data-driven Hypotheses in the Wild☆124Updated 7 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 11 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 2 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated 2 years ago
- ☆29Updated 2 weeks ago
- ☆18Updated 5 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- ☆80Updated 3 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆27Updated 10 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆38Updated last year
- Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025☆28Updated 8 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ☆25Updated 7 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 9 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆41Updated 2 years ago
- List of papers on Self-Correction of LLMs.☆80Updated last year
- PyTorch implementation for MRL☆20Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Updated last year
- Lottery Ticket Adaptation☆40Updated last year
- ☆55Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆90Updated last year
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 5 months ago
- ☆33Updated last year
- ☆32Updated 6 months ago
- [TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Thr…☆32Updated last month
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆118Updated 4 months ago