facebookresearch / MobileLLM-R1Links
MobileLLM-R1
☆75Updated 4 months ago
Alternatives and similar repositories for MobileLLM-R1
Users that are interested in MobileLLM-R1 are comparing it to the libraries listed below
Sorting:
- The official repo of continuous speculative decoding☆31Updated 10 months ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated 2 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆17Updated 10 months ago
- ☆169Updated 4 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆22Updated 3 months ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆44Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 9 months ago
- ☆42Updated 4 months ago
- research work on multimodal cognitive ai☆68Updated last month
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆59Updated 10 months ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆55Updated last year
- [ICLR 2026] Geometric-Mean Policy Optimization☆99Updated last week
- Train vector quantized CLIP models using pytorch lightning☆20Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆32Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Updated 6 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated 2 years ago
- ☆73Updated 6 months ago
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆104Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆119Updated 3 weeks ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models☆24Updated 10 months ago
- ☆19Updated last year
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆33Updated 3 years ago
- Lottery Ticket Adaptation☆39Updated last year
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆63Updated last year
- 😊 TPTT: Transforming Pretrained Transformers into Titans☆57Updated 2 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Updated 5 months ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Updated 3 weeks ago