josehoras / Knowledge-DistillationLinks
☆11Updated 5 years ago
Alternatives and similar repositories for Knowledge-Distillation
Users that are interested in Knowledge-Distillation are comparing it to the libraries listed below
Sorting:
- First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…☆178Updated 2 months ago
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)☆170Updated last year
- Distributed training (multi-node) of a Transformer model☆83Updated last year
- LoRA and DoRA from Scratch Implementations☆212Updated last year
- Conference schedule, top papers, and analysis of the data for NeurIPS 2023!☆120Updated last year
- From scratch implementation of a vision language model in pure PyTorch☆239Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆116Updated 2 years ago
- Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.☆43Updated last year
- Notes on quantization in neural networks☆99Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 9 months ago
- ☆43Updated 2 months ago
- Implementation of xLSTM in Pytorch from the paper: "xLSTM: Extended Long Short-Term Memory"☆119Updated 2 weeks ago
- Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.☆195Updated 2 years ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆187Updated 2 weeks ago
- ☆65Updated last year
- ☆134Updated last year
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆206Updated last week
- Implementation of the paper "Denoising Diffusion Probabilistic Models" in PyTorch☆64Updated 2 years ago
- non-official NoisyNN Implemnentation☆50Updated last year
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆47Updated last year
- ☆48Updated 7 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated 2 weeks ago
- Reproduction of DeepSeek-R1☆238Updated 5 months ago
- Awesome list of papers that extend Mamba to various applications.☆137Updated 3 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆122Updated 11 months ago
- Pytorch (Lightning) implementation of the Mamba model☆29Updated 5 months ago
- Pytorch implementation of the xLSTM model by Beck et al. (2024)☆173Updated last year
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆48Updated last year
- Quick exploration into fine tuning florence 2☆330Updated last year
- PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …☆74Updated 10 months ago