wschella / llm-reliabilityLinks
Code for the paper "Larger and more instructable language models become less reliable"
☆31Updated last year
Alternatives and similar repositories for llm-reliability
Users that are interested in llm-reliability are comparing it to the libraries listed below
Sorting:
- Discovering Data-driven Hypotheses in the Wild☆122Updated 6 months ago
- Optimize Any User-defined Compound AI Systems☆63Updated 4 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- ☆100Updated last week
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated last month
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆78Updated 4 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆38Updated last year
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025☆28Updated 7 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆115Updated 3 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆123Updated last year
- ☆77Updated 2 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 10 months ago
- Framework enabling modular interchange of language agents, environments, and optimizers☆117Updated last week
- [NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆41Updated 5 months ago
- ☆29Updated last week
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 8 months ago
- implementation of dualformer☆24Updated 9 months ago
- ☆32Updated 5 months ago
- Official Implementation of the Baby-AIGS system☆24Updated last year
- ☆17Updated 4 months ago
- Pretraining Code for METAGENE-1☆68Updated 11 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 9 months ago
- ☆34Updated 7 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 11 months ago
- List of papers on Self-Correction of LLMs.☆81Updated 11 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆52Updated 8 months ago
- Lottery Ticket Adaptation☆40Updated last year