GFNOrg / gfn-lm-tuning
☆168Updated last year
Alternatives and similar repositories for gfn-lm-tuning:
Users that are interested in gfn-lm-tuning are comparing it to the libraries listed below
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆114Updated 2 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆112Updated 4 months ago
- ☆76Updated 6 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆172Updated 5 months ago
- ☆86Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆135Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆122Updated 5 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆183Updated 8 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆49Updated 3 months ago
- ☆127Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- Language models scale reliably with over-training and on downstream tasks☆96Updated 9 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆125Updated 9 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆98Updated last year
- ☆94Updated 7 months ago
- ☆78Updated 10 months ago
- ☆83Updated 11 months ago
- Understand and test language model architectures on synthetic tasks.☆177Updated last week
- ☆138Updated this week
- ☆51Updated 8 months ago
- ☆109Updated 5 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆98Updated this week
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆67Updated 2 months ago
- Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆29Updated 3 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆95Updated 10 months ago
- ☆74Updated last year
- ☆53Updated 2 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆110Updated last week
- Self-Alignment with Principle-Following Reward Models☆152Updated 11 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆81Updated 2 months ago