[ICLR 2026] Tina: Tiny Reasoning Models via LoRA
β323Sep 23, 2025Updated 5 months ago
Alternatives and similar repositories for Tina
Users that are interested in Tina are comparing it to the libraries listed below
Sorting:
- DPO, but faster πβ48Dec 6, 2024Updated last year
- β15Apr 26, 2025Updated 10 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ1,222Aug 27, 2025Updated 6 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleβ412Nov 21, 2025Updated 3 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]β223Nov 27, 2025Updated 3 months ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusionβ14Mar 17, 2025Updated 11 months ago
- [NeurIPS 2025] TTRL: Test-Time Reinforcement Learningβ1,002Feb 23, 2026Updated 2 weeks ago
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Reiβ¦β1,338May 16, 2025Updated 9 months ago
- Reinforcing General Reasoning without Verifiersβ96Jun 24, 2025Updated 8 months ago
- A series of technical report on Slow Thinking with LLMβ761Aug 13, 2025Updated 6 months ago
- β527Feb 4, 2026Updated last month
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β637Jan 29, 2026Updated last month
- β14Apr 14, 2025Updated 10 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimizationβ52Jul 15, 2025Updated 7 months ago
- π LLM-I: Transform LLMs into natural interleaved multimodal creators! β¨ Tool-use framework supporting image search, generation, code exβ¦β41Oct 20, 2025Updated 4 months ago
- β16Jul 23, 2024Updated last year
- Official Repo for Open-Reasoner-Zeroβ2,084Jun 2, 2025Updated 9 months ago
- GRadient-INformed MoEβ263Sep 25, 2024Updated last year
- Train your own SOTA deductive reasoning modelβ107Mar 6, 2025Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"β448Oct 16, 2024Updated last year
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRLβ4,135Nov 13, 2025Updated 3 months ago
- β67May 23, 2025Updated 9 months ago
- AllenAI's post-training codebaseβ3,614Updated this week
- An Open-source RL System from ByteDance Seed and Tsinghua AIRβ1,750May 11, 2025Updated 9 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025β33May 1, 2025Updated 10 months ago
- Simple RL training for reasoningβ3,830Dec 23, 2025Updated 2 months ago
- Async RL Training at Scaleβ1,107Updated this week
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillationβ71Oct 17, 2025Updated 4 months ago
- [AAAI 2026] - Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"β274Feb 20, 2026Updated 2 weeks ago
- β19Jan 3, 2025Updated last year
- Fine-tuning Quantized Neural Networks with Zeroth-order Optimizationβ16Sep 17, 2025Updated 5 months ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisiβ¦β14Jun 6, 2025Updated 9 months ago
- Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).β12Sep 22, 2025Updated 5 months ago
- β43Apr 22, 2025Updated 10 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ142Dec 17, 2025Updated 2 months ago
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]β43Aug 25, 2025Updated 6 months ago
- Deep Reasoning Translation (DRT) Projectβ240Sep 1, 2025Updated 6 months ago
- [NeurIPS 2025] Thinkless: LLM Learns When to Thinkβ254Sep 26, 2025Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optunaβ59Oct 18, 2025Updated 4 months ago