wschella / llm-reliabilityLinks
Code for the paper "Larger and more instructable language models become less reliable"
☆31Updated 11 months ago
Alternatives and similar repositories for llm-reliability
Users that are interested in llm-reliability are comparing it to the libraries listed below
Sorting:
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 7 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆42Updated 6 months ago
- Discovering Data-driven Hypotheses in the Wild☆110Updated 3 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 10 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆36Updated last year
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆50Updated 10 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated 10 months ago
- ☆20Updated 6 months ago
- implementation of dualformer☆20Updated 6 months ago
- [TACL] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retri…☆31Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆88Updated last year
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆85Updated last year
- ☆23Updated last month
- A repository for research on medium sized language models.☆77Updated last year
- UQ: Assessing Language Models on Unsolved Questions☆27Updated 3 weeks ago
- Pre-trained Language Model for Scientific Text☆46Updated last year
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆68Updated last month
- ☆29Updated last month
- ☆30Updated 7 months ago
- ☆35Updated 4 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆96Updated 9 months ago
- ☆54Updated 10 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 5 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 9 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆60Updated 2 months ago
- ☆25Updated 3 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆101Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆74Updated 5 months ago