This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆224Aug 10, 2023Updated 2 years ago
Alternatives and similar repositories for Shepherd
Users that are interested in Shepherd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Evaluate the Quality of Critique☆37Jun 1, 2024Updated 2 years ago
- Official codebase for "SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation"☆226Jun 6, 2023Updated 3 years ago
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 4 years ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆368Dec 29, 2023Updated 2 years ago
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…☆321Nov 11, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆122Aug 16, 2023Updated 2 years ago
- Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment☆1,038May 31, 2024Updated 2 years ago
- Generative Judge for Evaluating Alignment☆250Jan 18, 2024Updated 2 years ago
- ☆44Jun 2, 2024Updated 2 years ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated 2 years ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Mar 22, 2024Updated 2 years ago
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,143Jun 1, 2023Updated 3 years ago
- Salesforce open-source LLMs with 8k sequence length.☆727Jun 2, 2026Updated last week
- ☆284Jan 6, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆47Aug 1, 2023Updated 2 years ago
- Dromedary: towards helpful, ethical and reliable LLMs.☆1,138Sep 18, 2025Updated 8 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆34Aug 9, 2023Updated 2 years ago
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Sep 23, 2023Updated 2 years ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆208May 24, 2023Updated 3 years ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆906Sep 30, 2025Updated 8 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆269Sep 12, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Self-Alignment with Principle-Following Reward Models☆170Sep 18, 2025Updated 8 months ago
- FacTool: Factuality Detection in Generative AI☆933Aug 19, 2024Updated last year
- AllenAI's post-training codebase☆3,746Jun 8, 2026Updated last week
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆553Mar 10, 2024Updated 2 years ago
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated 2 years ago
- O1 Replication Journey☆2,000Jan 14, 2025Updated last year
- Secrets of RLHF in Large Language Models Part I: PPO☆1,426Mar 3, 2024Updated 2 years ago
- [ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…☆983Oct 22, 2024Updated last year
- ☆42May 22, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆23Aug 7, 2023Updated 2 years ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,492Feb 8, 2026Updated 4 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆322Dec 20, 2023Updated 2 years ago
- Code for the paper "Fishing for Magikarp"☆191Updated this week
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,839Jun 17, 2025Updated 11 months ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆844Jul 1, 2024Updated last year
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆49Nov 29, 2024Updated last year