dinobby / Symbolic-MoELinks
The code implementation of Symbolic-MoE
☆31Updated 2 months ago
Alternatives and similar repositories for Symbolic-MoE
Users that are interested in Symbolic-MoE are comparing it to the libraries listed below
Sorting:
- ☆45Updated 3 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆89Updated last week
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- Code for Heima☆44Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 2 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated 2 weeks ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆60Updated this week
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆36Updated last week
- ☆18Updated 2 weeks ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 4 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆104Updated 2 months ago
- Official Repository of LatentSeek☆30Updated last week
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆43Updated this week
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆27Updated 2 weeks ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models☆49Updated last week
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆91Updated 3 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆132Updated 4 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆70Updated 2 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆15Updated 3 weeks ago
- ☆27Updated last month
- Process Reward Models That Think☆38Updated last week
- ☆40Updated 3 weeks ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆64Updated 3 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 3 months ago
- ☆93Updated 8 months ago