shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆55Updated this week
Alternatives and similar repositories for Tina:
Users that are interested in Tina are comparing it to the libraries listed below
- ☆62Updated 3 weeks ago
- ☆24Updated 7 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- NeurIPS 2024 tutorial on LLM Inference☆42Updated 4 months ago
- ☆47Updated 7 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆86Updated 2 weeks ago
- ☆46Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated 3 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated last week
- Exploration of automated dataset selection approaches at large scales.☆39Updated last month
- o1 Chain of Thought Examples☆33Updated 6 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆34Updated last month
- Repo for "Z1: Efficient Test-time Scaling with Code"☆55Updated 2 weeks ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆32Updated last month
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆32Updated 6 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆60Updated last week
- Train, tune, and infer Bamba model☆88Updated this week
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- ☆107Updated 3 months ago
- ☆31Updated 3 months ago
- The repository contains code for Adaptive Data Optimization☆23Updated 4 months ago
- ☆43Updated 2 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆93Updated 6 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆91Updated 2 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆61Updated 6 months ago