TheDuckAI / prmLinks
☆12Updated 6 months ago
Alternatives and similar repositories for prm
Users that are interested in prm are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 5 months ago
- Scaling scaling laws with board games.☆50Updated 2 years ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆77Updated 3 years ago
- ☆23Updated 9 months ago
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- Learn online intrinsic rewards from LLM feedback☆41Updated 7 months ago
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Updated 11 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 10 months ago
- ☆34Updated 2 years ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆31Updated last month
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28Updated last year
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated last year
- Reinforcement Learning via Regressing Relative Rewards☆34Updated 7 months ago
- Minimal but scalable implementation of large language models in JAX☆35Updated this week
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆60Updated 4 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆77Updated 8 months ago
- [ICLR 2025] Code for the paper "Implicit Search via Discrete Diffusion: A Study on Chess"☆29Updated 4 months ago
- ☆83Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆29Updated last year
- Repo to reproduce the First-Explore paper results☆37Updated 6 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆77Updated last year
- ☆19Updated 2 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- Implementation of Direct Preference Optimization☆16Updated 2 years ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆59Updated 2 months ago
- A library to create and manage configuration files, especially for machine learning projects.☆78Updated 3 years ago
- Code for minimum-entropy coupling.☆32Updated last year
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆25Updated 7 months ago
- Automatically take good care of your preemptible TPUs☆36Updated 2 years ago
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year