Rethinking the Trust Region in LLM Reinforcement Learning
☆39Feb 25, 2026Updated this week
Alternatives and similar repositories for Stable-RL
Users that are interested in Stable-RL are comparing it to the libraries listed below
Sorting:
- ThinkGen: Generalized Thinking for Visual Generation☆51Dec 30, 2025Updated 2 months ago
- A collection of various llm pruning implementations, training code for GPUs & TPUs, and evaluation script.☆61Feb 18, 2026Updated last week
- MB-X.01 · Logical Origin Node (L.O.N.) — TruthΩ → Co⁺ → Score⁺. Demo e spec verificabili. https://massimiliano.neocities.org/☆59Feb 3, 2026Updated last month
- [ICLR2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆30Feb 4, 2026Updated 3 weeks ago
- The official PyTorch code for "Relation-aware Instance Refinement for Weakly Supervised Visual Grounding" accepted by CVPR2021☆27Oct 9, 2021Updated 4 years ago
- Dr. MAS is an end-to-end RL training framework for multi-agent LLM systems, supporting the co-training of multiple (heterogeneous) LLMs.☆89Feb 11, 2026Updated 2 weeks ago
- Open Set Semantic Segmentation☆10Dec 23, 2020Updated 5 years ago
- Integrating neurosymbolic representations into LLMs for interpretability, steering, and running symbolic algorithms☆14Feb 2, 2026Updated last month
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆32Nov 11, 2025Updated 3 months ago
- ☆29Jan 15, 2026Updated last month
- ☆16Feb 22, 2025Updated last year
- Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)☆10Feb 21, 2023Updated 3 years ago
- Deepseek-CoT☆10Oct 6, 2024Updated last year
- [ACL 2023] Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction?☆10Dec 15, 2025Updated 2 months ago
- [COLING 2025 Industry] LoRA Soups☆18Nov 29, 2024Updated last year
- Langchain + Docker + Neo4j☆10Oct 29, 2024Updated last year
- [ICLR2025] Are Large Vision Language Models Good Game Players?☆12Mar 3, 2025Updated last year
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- Prompt templates for language models☆10Feb 22, 2026Updated last week
- ☆15Apr 26, 2025Updated 10 months ago
- A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus☆10Jun 26, 2024Updated last year
- Internal utility libraries for Pkl☆15Updated this week
- Official PyTorch implementation for Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability [Neur…☆13Jul 7, 2025Updated 7 months ago
- ☆10Jun 28, 2025Updated 8 months ago
- ACM MM 2022 - PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding☆11Aug 12, 2022Updated 3 years ago
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Aug 25, 2023Updated 2 years ago
- Code for "What really matters in matrix-whitening optimizers?"☆21Oct 31, 2025Updated 4 months ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆14Apr 30, 2025Updated 10 months ago
- source code for NAACL2022 main conference "Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs"☆10Sep 26, 2022Updated 3 years ago
- ☆20Dec 3, 2025Updated 3 months ago
- An implementation of JSON Lines for Rust☆16Apr 11, 2021Updated 4 years ago
- Using DTensor on Google Cloud☆18Sep 18, 2022Updated 3 years ago
- Code for Research Project TLDR☆25Jul 28, 2025Updated 7 months ago
- This project addresses to estimate heart rate (HR) during exercise in real-time using wrist-type PPG signals amidst intense motion artifa…☆12Jul 20, 2020Updated 5 years ago
- Command helper for slurm system. Act as if you are on compute node.☆15Feb 1, 2025Updated last year
- Source code for paper Are Human-generated Demonstrations Necessary for In-context Learning☆12Jan 21, 2024Updated 2 years ago
- ☆12Dec 30, 2025Updated 2 months ago
- ☆13May 12, 2025Updated 9 months ago
- ☆28Jan 11, 2026Updated last month