wschella / llm-reliabilityLinks
Code for the paper "Larger and more instructable language models become less reliable"
☆30Updated 10 months ago
Alternatives and similar repositories for llm-reliability
Users that are interested in llm-reliability are comparing it to the libraries listed below
Sorting:
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated 10 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 10 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆40Updated last month
- Discovering Data-driven Hypotheses in the Wild☆104Updated 2 months ago
- ☆28Updated 6 months ago
- implementation of dualformer☆20Updated 6 months ago
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆68Updated last month
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 5 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 6 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆36Updated last year
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆98Updated this week
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated 5 months ago
- A repository for research on medium sized language models.☆78Updated last year
- ☆30Updated 6 months ago
- ☆38Updated this week
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025☆26Updated 4 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆96Updated 9 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆118Updated last year
- Framework enabling modular interchange of language agents, environments, and optimizers☆104Updated last week
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Updated last year
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆20Updated 6 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 4 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆49Updated 10 months ago
- A platform for Interactive AI-assisted Hypothesis Generation [ACL 2025]☆21Updated 2 weeks ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆61Updated 4 months ago
- ☆24Updated 3 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆61Updated last month