caiyuchen-ustc / Alpha-RLLinks
On Predictability of Reinforcement Learning Dynamics for Large Language Models
☆51Updated last month
Alternatives and similar repositories for Alpha-RL
Users that are interested in Alpha-RL are comparing it to the libraries listed below
Sorting:
- Marco Search Agent for Realistic and Challenging Agentic Search☆240Updated 2 months ago
- ☆357Updated 6 months ago
- "LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"☆582Updated 2 months ago
- This repo collects research papers that use AI tools and are in the field of scientific research (including computer science, agronomy, c…☆98Updated 9 months ago
- We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFM…☆312Updated last month
- [BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.☆455Updated 3 weeks ago
- [TMLR'25] The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning☆53Updated 9 months ago
- MTLA: Multi-head Temporal Latent Attention☆761Updated 3 months ago
- Repo-level benchmark for real-world Code Agents: from repo understanding → env setup → incremental dev/bug-fixing → task delivery, with c…☆243Updated 3 months ago
- DPO-Shift: Shifting the Distribution of Direct Preference Optimization☆59Updated 10 months ago
- Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning☆164Updated last month
- A live benchmark and evaluation framework for open-ended deep research in the wild.☆102Updated 2 months ago
- [MM 2024] Official code for VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness☆52Updated last year
- Tokenize The Virtual Agents Onchain☆243Updated 7 months ago
- [AAAI 2026 Oral] Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution☆357Updated last month
- [AAAI 2026 Oral] Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic al…☆123Updated last month
- ☆205Updated 3 weeks ago
- ☆73Updated last month
- ☆33Updated 2 months ago
- ☆246Updated last year
- Fat-Cat: A document-centric context management Agent. Making context as simple as reading chat history.☆281Updated last week
- INFTY Engine: An Optimization Toolkit to Support Continual AI☆566Updated 4 months ago
- ☆104Updated 7 months ago
- Siray ComfyUI Nodes☆85Updated last month
- ☆293Updated 6 months ago
- Science-Star: A Platform for Building, Extending, and Experimenting with Scientific Agents.☆738Updated 3 months ago
- We introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for comple…☆1,103Updated last month
- UR2: Unify RAG and Reasoning through Reinforcement Learning☆126Updated last month
- ☆516Updated 10 months ago
- 超能文献|AI驱动的文档翻译与学术搜索服务。支持PDF、DOCX、PPTX等多格式文档的高质量翻译(支持11种语言),特别优化了数学公式翻译。同时提供PubMed学术文献智能搜索功能。更多访问:https://suppr.wilddata.cn☆246Updated 2 months ago