wschella / llm-reliability
Code for the paper "Larger and more instructable language models become less reliable"
☆29Updated 5 months ago
Alternatives and similar repositories for llm-reliability:
Users that are interested in llm-reliability are comparing it to the libraries listed below
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆37Updated 2 weeks ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 months ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago
- ☆33Updated 3 weeks ago
- ☆19Updated 3 weeks ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆43Updated 5 months ago
- ☆31Updated 2 months ago
- Official Code Release for "Training a Generally Curious Agent"☆19Updated 3 weeks ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- We study toy models of skill learning.☆24Updated 2 months ago
- ☆15Updated 6 months ago
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval an…☆27Updated 6 months ago
- Exploration of automated dataset selection approaches at large scales.☆34Updated 3 weeks ago
- PyTorch implementation for MRL☆18Updated last year
- ☆21Updated 5 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆21Updated 3 weeks ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated 2 weeks ago
- ☆48Updated 4 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆23Updated 2 weeks ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆28Updated last month
- Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders☆16Updated last month
- Lottery Ticket Adaptation☆39Updated 4 months ago
- ☆12Updated 4 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆32Updated 5 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆16Updated this week
- implementation of dualformer☆13Updated last month
- ☆25Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆77Updated 4 months ago
- Discovering Data-driven Hypotheses in the Wild☆65Updated 4 months ago