wschella / llm-reliabilityLinks
Code for the paper "Larger and more instructable language models become less reliable"
☆31Updated last year
Alternatives and similar repositories for llm-reliability
Users that are interested in llm-reliability are comparing it to the libraries listed below
Sorting:
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 9 months ago
- Discovering Data-driven Hypotheses in the Wild☆115Updated 5 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- ☆29Updated 3 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆44Updated 7 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆19Updated last month
- Optimize Any User-defined Compound AI Systems☆61Updated 2 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated this week
- Source code for GreaTer ICLR 2025 - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers☆32Updated 6 months ago
- ☆82Updated 2 weeks ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- A testbed for agents and environments that can automatically improve models through data generation.☆27Updated 8 months ago
- ☆80Updated this week
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆41Updated 4 months ago
- ☆73Updated last month
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Updated last year
- ☆48Updated 7 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
- PyTorch implementation for MRL☆19Updated last year
- Pretraining Code for METAGENE-1☆68Updated 10 months ago
- ☆33Updated 10 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 7 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆100Updated last week
- Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025☆28Updated 6 months ago
- [TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Thr…☆32Updated last year
- ☆55Updated last year
- Lottery Ticket Adaptation☆40Updated 11 months ago