Tencent / llm.hunyuan.T1
☆68Updated this week
Alternatives and similar repositories for llm.hunyuan.T1:
Users that are interested in llm.hunyuan.T1 are comparing it to the libraries listed below
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆130Updated 9 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆161Updated last week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆78Updated 2 weeks ago
- ☆143Updated 2 weeks ago
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆83Updated 2 weeks ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆97Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated 2 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆149Updated last week
- A Comprehensive Survey on Long Context Language Modeling☆113Updated this week
- ☆125Updated 3 weeks ago
- A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆42Updated this week
- ☆94Updated 3 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆67Updated last week
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆64Updated last week
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning☆32Updated last month
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 4 months ago
- ☆29Updated 4 months ago
- ☆60Updated 4 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆151Updated 2 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated this week
- Knowledge-Reasoning Synergy Reinforcement Learning.☆34Updated last month
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆148Updated 2 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆103Updated 2 weeks ago
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking☆38Updated 2 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆33Updated 2 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆72Updated last week
- ☆171Updated last month
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆66Updated this week
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆148Updated 2 weeks ago