dinobby / Symbolic-MoELinks
The code implementation of Symbolic-MoE
☆43Updated last month
Alternatives and similar repositories for Symbolic-MoE
Users that are interested in Symbolic-MoE are comparing it to the libraries listed below
Sorting:
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 4 months ago
- ☆49Updated 8 months ago
- SSRL: Self-Search Reinforcement Learning☆145Updated last month
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆58Updated 7 months ago
- ☆38Updated this week
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆39Updated 3 weeks ago
- Tree Search for LLM Agent Reinforcement Learning☆127Updated last week
- ☆35Updated 4 months ago
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆104Updated 4 months ago
- Process Reward Models That Think☆55Updated 3 months ago
- ☆218Updated 7 months ago
- ☆86Updated last week
- Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"☆246Updated last month
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆101Updated 2 months ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆73Updated 10 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆42Updated last week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 8 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆90Updated 7 months ago
- Geometric-Mean Policy Optimization☆83Updated last week
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆82Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆85Updated 4 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆114Updated 5 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆106Updated 4 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆155Updated 2 weeks ago
- Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆37Updated 3 weeks ago
- ☆133Updated last month
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆104Updated 6 months ago
- ☆98Updated last month
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆68Updated 3 months ago