HarleyCoops / smolThinker-.5B
A Qwen .5B reasoning model trained on OpenR1-Math-220k
☆12Updated last month
Alternatives and similar repositories for smolThinker-.5B:
Users that are interested in smolThinker-.5B are comparing it to the libraries listed below
- ☆11Updated 8 months ago
- ☆48Updated 4 months ago
- Collection of autoregressive model implementation☆83Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆76Updated 3 weeks ago
- Simple GRPO scripts and configurations.☆59Updated last month
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 3 weeks ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated last month
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 3 months ago
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 3 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last month
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆91Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆74Updated 7 months ago
- ☆83Updated last month
- ☆111Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆56Updated 2 weeks ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆26Updated 3 weeks ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 10 months ago
- ☆24Updated 6 months ago
- ☆31Updated 2 months ago
- working implimention of deepseek MLA☆39Updated 2 months ago
- Lego for GRPO☆25Updated 2 weeks ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 7 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 5 months ago
- entropix style sampling + GUI☆25Updated 5 months ago
- ☆16Updated last month
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆28Updated last week
- ☆47Updated last week