kyegomez / Paper-Implementation-TemplateLinks
A simple reproducible template to implement AI research papers
☆24Updated 10 months ago
Alternatives and similar repositories for Paper-Implementation-Template
Users that are interested in Paper-Implementation-Template are comparing it to the libraries listed below
Sorting:
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆49Updated 4 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆99Updated last week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 9 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆159Updated 3 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆223Updated 2 months ago
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…☆16Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆206Updated 6 months ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆107Updated 3 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- A repository for DenseSSMs☆87Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Updated 2 weeks ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 5 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated last year
- Implementation of the premier Text to Video model from OpenAI☆57Updated 8 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆124Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆122Updated 6 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆49Updated last year
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆171Updated 6 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆60Updated 9 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆145Updated 9 months ago
- Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆90Updated 2 years ago
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆30Updated 8 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Updated 8 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆55Updated last year
- ☆77Updated 9 months ago
- ☆97Updated 9 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆159Updated 6 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆85Updated 8 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- a family of highly capabale yet efficient large multimodal models☆185Updated 10 months ago