aadityasingh / HARP
☆14Updated last month
Alternatives and similar repositories for HARP:
Users that are interested in HARP are comparing it to the libraries listed below
- Efficient Scaling laws and collaborative pretraining.☆13Updated this week
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 9 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆49Updated 7 months ago
- ☆24Updated 3 weeks ago
- The official implementation of Self-Exploring Language Models (SELM)☆61Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆53Updated 5 months ago
- ☆70Updated 5 months ago
- ☆59Updated 9 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆101Updated this week
- ☆19Updated 3 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆27Updated last month
- ☆21Updated 5 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆66Updated 2 months ago
- The repository contains code for Adaptive Data Optimization☆20Updated last month
- ☆76Updated 6 months ago
- NeurIPS 2024 tutorial on LLM Inference☆38Updated last month
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆68Updated 3 weeks ago
- ☆43Updated 5 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆57Updated 8 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆42Updated 6 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆23Updated 4 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- Language models scale reliably with over-training and on downstream tasks☆96Updated 9 months ago
- ☆30Updated 11 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated 7 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated 10 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆40Updated 2 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆112Updated 4 months ago