kyegomez / Paper-Implementation-TemplateLinks

A simple reproducible template to implement AI research papers

☆24

Alternatives and similar repositories for Paper-Implementation-Template

Users that are interested in Paper-Implementation-Template are comparing it to the libraries listed below

Sorting:

RWKV / RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆49Updated 4 months ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆99Updated last week
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 9 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆159Updated 3 months ago
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆223Updated 2 months ago
kyegomez / MGQA
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…
☆16Updated last year
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆206Updated 6 months ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆107Updated 3 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 10 months ago
WailordHe / DenseSSM
A repository for DenseSSMs
☆87Updated last year
kyegomez / TTL
Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"
☆25Updated 2 weeks ago
kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Updated 5 months ago
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆35Updated last year
kyegomez / Sora
Implementation of the premier Text to Video model from OpenAI
☆57Updated 8 months ago
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆122Updated 6 months ago
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆49Updated last year
NVlabs / MaskLLM
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
☆171Updated 6 months ago
ByungKwanLee / Phantom
[Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…
☆60Updated 9 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆145Updated 9 months ago
ZrrSkywalker / LLaMA-Adapter
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆90Updated 2 years ago
kyegomez / EvoVLM-JP
Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI
☆30Updated 8 months ago
kyegomez / TinyGPTV
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
☆16Updated 8 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆55Updated last year
huggingface / fineVideo
☆77Updated 9 months ago
ScalingIntelligence / large_language_monkeys
☆97Updated 9 months ago
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆159Updated 6 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆85Updated 8 months ago
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 4 months ago
MILVLG / imp
a family of highly capabale yet efficient large multimodal models
☆185Updated 10 months ago