siegelz / core-bench
☆18Updated last week
Related projects ⓘ
Alternatives and complementary repositories for core-bench
- ☆37Updated last year
- PyTorch implementation for MRL☆18Updated 9 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆58Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- ☆47Updated 9 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆35Updated last month
- ☆33Updated last month
- ☆28Updated 8 months ago
- Critique-out-Loud Reward Models☆38Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆24Updated 3 weeks ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 4 months ago
- ☆45Updated 2 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆36Updated 3 weeks ago
- ☆41Updated 2 weeks ago
- ☆19Updated last month
- PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models☆20Updated 4 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- ☆40Updated 6 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated last year
- The repository contains generative AI analytics platform application code.☆22Updated 3 weeks ago
- ☆112Updated last month
- ☆42Updated 4 months ago
- ☆9Updated 4 months ago
- ☆22Updated 2 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆37Updated last month
- Textbook on reinforcement learning from human feedback☆76Updated 3 weeks ago