ByungKwanLee / DeepSick-R1
Reproduction of DeepSeek-R1
☆227Updated 3 weeks ago
Alternatives and similar repositories for DeepSick-R1
Users that are interested in DeepSick-R1 are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆88Updated 2 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training☆271Updated 2 weeks ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆294Updated last week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆445Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆140Updated 2 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆127Updated last week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆323Updated 5 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆91Updated 4 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆205Updated this week
- ☆184Updated 3 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆216Updated last week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆162Updated last week
- TTRL: Test-Time Reinforcement Learning☆452Updated 2 weeks ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆168Updated last month
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆154Updated last month
- ☆176Updated 5 months ago
- A curated list of awesome Multimodal studies.☆189Updated 2 weeks ago
- LLM-Merging: Building LLMs Efficiently through Merging☆197Updated 7 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆164Updated last week
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆115Updated 8 months ago
- ☆287Updated last month
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆166Updated 4 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆222Updated last month
- Tina: Tiny Reasoning Models via LoRA☆192Updated 3 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆198Updated last week
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆114Updated 11 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆116Updated this week
- [Arxiv 2025] Efficient Reasoning Models: A Survey☆146Updated last week