A library for constrained RLHF.
☆13Feb 19, 2024Updated 2 years ago
Alternatives and similar repositories for ConstrainedRL4LMs
Users that are interested in ConstrainedRL4LMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18May 15, 2025Updated last year
- Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.☆11Apr 5, 2023Updated 3 years ago
- [TMLR] Process Reward Models That Think☆89Nov 29, 2025Updated 5 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆56Jun 16, 2024Updated last year
- Learning from Indirect Observations☆11Jul 16, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆202Dec 16, 2023Updated 2 years ago
- Collection of gym environments with support for domain randomization☆10Dec 11, 2024Updated last year
- ☆12Oct 14, 2022Updated 3 years ago
- This repository contains the dataset and code for our ACL'23 publication: "MatSci-NLP: Evaluating Scientific Language Models on Materials…☆17Nov 21, 2023Updated 2 years ago
- Test-Time Label-Shift Adaptation☆13May 24, 2023Updated 3 years ago
- Official Implementation of "GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution"☆20Apr 3, 2024Updated 2 years ago
- ☆17Aug 10, 2022Updated 3 years ago
- Single Episode Policy Transfer in Reinforcement Learning☆17Jun 13, 2022Updated 3 years ago
- ☆37Jan 23, 2017Updated 9 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Python script to launch a container with X11 graphics support.☆19Mar 22, 2023Updated 3 years ago
- 🦙🦙.🦀☆28Sep 24, 2023Updated 2 years ago
- ☆22Sep 19, 2023Updated 2 years ago
- Augmenting Statistical Models with Natural Language Parameters☆28Sep 17, 2024Updated last year
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- [ICLR 2022] Towards Continual Knowledge Learning of Language Models☆91Oct 11, 2022Updated 3 years ago
- ☆14Jan 21, 2025Updated last year
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆13Feb 13, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Automated terminal emulator benchmarks☆23May 1, 2026Updated 3 weeks ago
- Reinforcement Learning for Uplift Modeling☆13Mar 13, 2021Updated 5 years ago
- Sparsify transformers with cross-layer transcoders☆23Nov 14, 2025Updated 6 months ago
- This project collects methods that enhance the comparison between AMR graphs.☆18Jun 15, 2023Updated 2 years ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆18Jan 12, 2026Updated 4 months ago
- An analog touch screen joystick that pretends to be a bevy gamepad☆13Jul 13, 2024Updated last year
- ☆13Updated this week
- Flight Recorder allows to record client program execution and examine it later☆11Sep 18, 2020Updated 5 years ago
- Code for co-training large language models (e.g. T0) with smaller ones (e.g. BERT) to boost few-shot performance☆17Sep 23, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [NeurIPS 2024] Introspective Planning: Aligning Robots’ Uncertainty with Inherent Task Ambiguity☆23Nov 6, 2025Updated 6 months ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆29Dec 19, 2023Updated 2 years ago
- A small game demonstrating a grid distortion effect☆15Oct 5, 2021Updated 4 years ago
- A library for training crosscoders☆17May 28, 2025Updated 11 months ago
- Development repository for the Triton language and compiler☆24Sep 17, 2025Updated 8 months ago
- ☆19Nov 4, 2018Updated 7 years ago