kyegomez / Paper-Implementation-Template
A simple reproducible template to implement AI research papers
☆23Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Paper-Implementation-Template
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆175Updated 3 weeks ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated this week
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆16Updated this week
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆48Updated last month
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆67Updated this week
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 2 weeks ago
- Official repo for StableLLAVA☆90Updated 10 months ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆29Updated 4 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆42Updated last year
- The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"☆25Updated this week
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- Implementation of the premier Text to Video model from OpenAI☆57Updated this week
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆120Updated 2 weeks ago
- Implementation of the proposed MaskBit from Bytedance AI☆58Updated 2 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆83Updated this week
- ☆72Updated 8 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆65Updated 5 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆33Updated 5 months ago
- ☆65Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated this week
- ☆37Updated last year
- ☆103Updated 3 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆59Updated this week