wschella / llm-reliabilityLinks

Code for the paper "Larger and more instructable language models become less reliable"

☆30

Alternatives and similar repositories for llm-reliability

Users that are interested in llm-reliability are comparing it to the libraries listed below

Sorting:

kumar-shridhar / Screws
SCREWS: A Modular Framework for Reasoning with Revisions
☆27Updated last year
bethgelab / CiteME
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆48Updated 9 months ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆99Updated 3 months ago
allenai / infinigram-api
☆73Updated 3 weeks ago
allenai / discoverybench
Discovering Data-driven Hypotheses in the Wild
☆104Updated 2 months ago
ZonglinY / MOOSE
[ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …
☆42Updated 9 months ago
ctlllll / understanding_llm_benchmarks
Understanding the correlation between different LLM benchmarks
☆29Updated last year
google / curie
Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025
☆26Updated 3 months ago
austrian-code-wizard / c3po
☆29Updated last week
allenai / SciRIFF
Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.
☆40Updated 4 months ago
dinobby / MAGDi
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…
☆36Updated last year
para-lost / ReBase
ReBase: Training Task Experts through Retrieval Based Distillation
☆29Updated 6 months ago
katzurik / Knowledge_Navigator
☆20Updated 5 months ago
safety-research / open-source-alignment-faking
Open Source Replication of Anthropic's Alignment Faking Paper
☆47Updated 4 months ago
facebookresearch / dualformer
implementation of dualformer
☆18Updated 5 months ago
cvenhoff / steering-thinking-llms
☆23Updated last month
benediktstroebl / agent-evals
☆23Updated 2 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆81Updated last week
da03 / WildVisualizer
☆22Updated last month
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆27Updated 6 months ago
yale-nlp / SciArena
Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"
☆45Updated this week
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated 3 months ago
katiekang1998 / reasoning_generalization
☆34Updated 7 months ago
krypticmouse / matryoshka-representation-learning
PyTorch implementation for MRL
☆19Updated last year
zou-group / sirius
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
☆61Updated 3 weeks ago
VITA-Group / o1-planning
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
☆39Updated last month
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆52Updated 3 weeks ago
benpry / why-think-step-by-step
Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"
☆61Updated 4 months ago
amair-lab / PiFlow
[preprint] PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
☆23Updated 3 weeks ago